enhance build process to allow selective -O3 optimization
the motivation for this patch is that the vast majority of libc is
code that does not benefit at all from optimizations, but that certain
components like string/memory operations can be major performance
bottlenecks.
at the same time, the old -falign-*=1 options are removed, since they
were only beneficial for avoiding bloat when global -O3 was used, and
in that case, they may have prevented some of the performance gains.
to be the most useful, this patch will need further tuning. in
particular, research is needed to determine which components should be
built with -O3 by default, and it may be desirable to remove the
hard-coded -O3 and instead allow more customization of the
optimization level used for selected modules.