Optimization

GCC does many different types of optimization, including source-level optimization, such as common subexpression elimination, function inlining, loop unrolling etc. It also provides low-level optimization such as instruction scheduling to make best use of the pipelines in the CPUs.

Generally, the optimization techniques are categorized as three levels, controlled by the GCC flag -O<level>. The zero level -O0, means GCC does not perform any optimization and compiles the source code in the most straightforward way. The level -O1 includes common optimizations which don't require speed-space offs. Level -O2 includes instruction scheduling. Level -O3 include expensive optimizations such as function inlining, which may increase the speed but increase the size of the executable, or under some circumstances, might also make the programs slower. There is an independent flag -funroll-loops, which may unroll some or all repetitions in loops into flat statements. Whether or not this increases speed needs to be examined on a case-by-case basis. -Os is another optimization option for optimizing for executable size.

I will try to look a bit deeper into common types of optimizations, and post some compilation examples to expose what is really going on with some optimizations.

  • Dead code elimination

Example: A piece of code looking at the compiled size of a C++ struct, containing two pointers, and an array of 256 chars. If the size is 8 bytes + 256 bytes = 264, then it outputs "correct", otherwise "incorrect".

entry_type var;
int actual_size = sizeof var;
printf("The size of the struct is %sly packed.\n",
   (actual_size == 8 + 256 ? "correct" : "incorrect"));

With optimization level -O0, the compiled code (RTL) or ASM is has basically the same structure as the C++ code. With -O2 turned on, GCC will perform dead code elimination, which removes one branch in the conditional expression. The result becomes something equivalent to the one-line C/C++ code:

printf("The size of the struct is %sly packed.\n",
   "correct");

With -O3 turned on, the above lines were compiled similarly (with some removal of stack frame setup/restoration). Further, some functions were inlined, while the original copy of the function is still kept (in the ASM code), for linking with other files. This made the file size bigger.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License