Month: February 2014

Change the compiler change the assembly.

As part of our first foray into understanding the relationship between c code, the compiler and assembly code. We will use our trusty “Hello World!” program to see how changes to compiler options change the machine code generated.

The first change we make to our compile option is to use the -static setting which tells the compiler to include all the code for the libraries that our Hello World programs needs to complete it job in our compiled binary instead of dynamically linking to the libraries when they are need by our program. This has the effect of ballooning the size of our program from a mere 8.4k to a whopping 823k. While static linking may have an advantage of speed, since the our program does not need to leave it self to find all the resources it needs to run. It runs counter to effective usage of computer resources, if everyone statically linked their programs our disk space would be used up quickly.

The next compiler option we look at is -fno-builtin according to the man page this tells the compiler to ignore built in functions. How this manifest itself in the assemply when we look at the objdump is that the assembly that were we used the -fno-builtin has the text from our c program in it. where the standard default compile does not.

0000000000400530 <main>:
#include <stdio.h>
int main(void){
printf(“Hello World!\n”);
400530:       55                      push   %rbp
400531:       48 89 e5                mov    %rsp,%rbp
400534:       bf d0 05 40 00          mov    $0x4005d0,%edi
400539:       e8 d2 fe ff ff          callq  400410 <puts@plt>
40053e:       5d                      pop    %rbp
40053f:       c3                      retq

Where our orginal compile looks like this:

0000000000400530 <main>:
400530:       55                      push   %rbp
400531:       48 89 e5                mov    %rsp,%rbp
400534:       bf e0 05 40 00          mov    $0x4005e0,%edi
400539:       b8 00 00 00 00          mov    $0x0,%eax
40053e:       e8 cd fe ff ff          callq  400410 <printf@plt>
400543:       5d                      pop    %rbp
400544:       c3                      retq
400545:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
40054c:       00 00 00
40054f:       90                      nop

What is interesting is that other than our clearly visible c code the assembly code appears to be identical.

Next up is the -g option, this tells the compiler to include extra debugging information in the assembled code. By removing it we have shaved a 1k of the compiled size of the binary. If we do an objdump with the -s option there are a whole slew of .debug information that is included at the end of our original compile is missing.

Now lets modify our c program to see how that effects memory addressing in our assembly code: I added a second printf statement that printed a series of single char values:  printf(“%c%c%c%c %c%c %c%c%c%c\n”,a,b,c,d,e,f,g,h,i,j); Where each varibable a-j was single character that spelt out “this is cool”. In our assembly code the “Hello World” was still put into memory address edi the other characters were addressed in reverse order from j to a in the following registries in this order. r10d, r9d, r8d, rdi, esi, edx,r11d,ecx, edx, and eax. The regiester edx appeas twice as we used the declaired a char with the value of “i” twice, so the compiler just called the same memory location twice.
The next modification we made was to put the printf satement of our hello world program into a function and call the function from main. When we do an objdump -d on the two outputs. We see that the assembly code now has a new section <hello> (the name of our function) along with main. And that the assembly code that used to be found in main is now found in hello.  Main now contains the following line
400534:       e8 02 00 00 00          callq  40053b <hello>
which tells the program to jump to that location and continue from there.

Finally we look at the effect of the -O(ptimization) option on the compiler output. With the option sect to -O3 the compiler has drastically cut down the size of our main program to just 3 lines compared to the original version which had 10 lines. It seems the compiler compiler has simplified the progam as much as posible , xor eax register with itself to zero it instead of moving $0x0 into the register. jump ing right to the memory location of printf, instead of calling it like it did in the original program.
In doing this lab, It has shown me that even at the more direct level of controlling our computer there are numerous ways to complete a task using a microprocessor. It is also striking how seemingly little changes at compile time can have drastic changes on the code that the compiler produces for the cpu to execute.