Change the compiler change the assembly.

As part of our first foray into understanding the relationship between c code, the compiler and assembly code. We will use our trusty “Hello World!” program to see how changes to compiler options change the machine code generated.

The first change we make to our compile option is to use the -static setting which tells the compiler to include all the code for the libraries that our Hello World programs needs to complete it job in our compiled binary instead of dynamically linking to the libraries when they are need by our program. This has the effect of ballooning the size of our program from a mere 8.4k to a whopping 823k. While static linking may have an advantage of speed, since the our program does not need to leave it self to find all the resources it needs to run. It runs counter to effective usage of computer resources, if everyone statically linked their programs our disk space would be used up quickly.

The next compiler option we look at is -fno-builtin according to the man page this tells the compiler to ignore built in functions. How this manifest itself in the assemply when we look at the objdump is that the assembly that were we used the -fno-builtin has the text from our c program in it. where the standard default compile does not.

0000000000400530 <main>:
#include <stdio.h>
int main(void){
printf(“Hello World!\n”);
400530:       55                      push   %rbp
400531:       48 89 e5                mov    %rsp,%rbp
400534:       bf d0 05 40 00          mov    $0x4005d0,%edi
400539:       e8 d2 fe ff ff          callq  400410 <puts@plt>
40053e:       5d                      pop    %rbp
40053f:       c3                      retq

Where our orginal compile looks like this:

0000000000400530 <main>:
400530:       55                      push   %rbp
400531:       48 89 e5                mov    %rsp,%rbp
400534:       bf e0 05 40 00          mov    $0x4005e0,%edi
400539:       b8 00 00 00 00          mov    $0x0,%eax
40053e:       e8 cd fe ff ff          callq  400410 <printf@plt>
400543:       5d                      pop    %rbp
400544:       c3                      retq
400545:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
40054c:       00 00 00
40054f:       90                      nop

What is interesting is that other than our clearly visible c code the assembly code appears to be identical.

Next up is the -g option, this tells the compiler to include extra debugging information in the assembled code. By removing it we have shaved a 1k of the compiled size of the binary. If we do an objdump with the -s option there are a whole slew of .debug information that is included at the end of our original compile is missing.

Now lets modify our c program to see how that effects memory addressing in our assembly code: I added a second printf statement that printed a series of single char values:  printf(“%c%c%c%c %c%c %c%c%c%c\n”,a,b,c,d,e,f,g,h,i,j); Where each varibable a-j was single character that spelt out “this is cool”. In our assembly code the “Hello World” was still put into memory address edi the other characters were addressed in reverse order from j to a in the following registries in this order. r10d, r9d, r8d, rdi, esi, edx,r11d,ecx, edx, and eax. The regiester edx appeas twice as we used the declaired a char with the value of “i” twice, so the compiler just called the same memory location twice.
The next modification we made was to put the printf satement of our hello world program into a function and call the function from main. When we do an objdump -d on the two outputs. We see that the assembly code now has a new section <hello> (the name of our function) along with main. And that the assembly code that used to be found in main is now found in hello.  Main now contains the following line
400534:       e8 02 00 00 00          callq  40053b <hello>
which tells the program to jump to that location and continue from there.

Finally we look at the effect of the -O(ptimization) option on the compiler output. With the option sect to -O3 the compiler has drastically cut down the size of our main program to just 3 lines compared to the original version which had 10 lines. It seems the compiler compiler has simplified the progam as much as posible , xor eax register with itself to zero it instead of moving $0x0 into the register. jump ing right to the memory location of printf, instead of calling it like it did in the original program.
In doing this lab, It has shown me that even at the more direct level of controlling our computer there are numerous ways to complete a task using a microprocessor. It is also striking how seemingly little changes at compile time can have drastic changes on the code that the compiler produces for the cpu to execute.




Changing up the source

In order to get familiar with the process surrounding code changes and review, we were asked to look at two different open source project and compare their change procedure. I chose to examine VICE, the Versatile Commodore Emulator, a multi-platform open source Commodore computer emulator and PMD, Don’t Shoot the Messenger, a source code analyzer designed to find a wide range of coding errors.

Both project use SourceForge to track changes to their code base. SourceForge offers a set of tools to assist the development of open source project, these help: tracking changes, create message boards, provide documentation and offering downloads.

The first few lines of VICE source code repository the developers outline their expectations in brief when you make a change notify the person responsible for that part of the project and update the ticketing system on SourceForge. In looking at a bug fix for correcting multi monitor support we see that their is a half month lag between the patch being submitted and it being accepted. With three developers joining in the conversation. This seems to be atypical for this project most patch were accepted quite quickly. Though this is probably due to the fact that at this point the project is quite mature so the changes tend to be minor.

In contrast the PDM patch page on SourceForge showed 8 open patch all of which had not been created more than a year ago and in one case 9 years ago. That is not to say that the project is no being maintained they have 267 accepted patches and the last release of the software was August of last year. On their GitHub page only 5 people are listed as members of the project.

The Linux experiment

I’ve always dabbled with Linux but never seen able to fully commit to it as a desktop operating system. A fact that I feel has limited my knowledge of it too being of a more casual nature. To rectify the situation I’m planning to make a technical New Years resolution to only use Linux for a month come January 1st.
I’ve chosen to go with Fedora 16 as my distro, mainly because I’ve spent more time with Ubuntu and want a better understanding of how Redhat does things.
The biggest issue I’m facing is dealing with iTunes and my iPhone. Not sure how I’m going to handle that problem, the two main solutions are to go with wine or a virtual machine.