some bookmarks…

a few sites i’ve found lately, during searches for ways to brain-storm, optimize, streamline, expand, … the low-level parts of axonlib and other code:
haven’t completely consumed all knowledge and ideas in there yet, as there’s a lot of info and hints….

Chris Lomont’s Publications page
Toshi’s Project Page
be nice to your cache
bit twiddling hacks
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
creating small win32 executables
techniques for reducing executable size
tiny pe
references on coding for optimization on x86 architectures
the impossibly fast c++ delegates
member function pointers and the fastest possible c++ delegates
const illustration in c++
rant on c++’s operator new
optimizing c++ (pdf)
c++ and the linker
hidden features of c++
source code for data structures and algorithm analysis in c
c++ for c programmers



– ccernn


2 Responses

  1. definitely a good and informative list…

    i have a few comments on the “optimizing c++ (pdf)” article. mostly _not_ criticism as its obviously a well written article by someone who knows very well what he is talking about.

    [in 2.6, page 12:]
    “…The Gnu compiler often inserts built-in code instead
    of the most common memory and string instructions. The built-in code is not optimal. Use option -fno-builtin to get library versions instead.”

    i think this might not be entirely and always true, since there are way too many variables, like compiler version, cpu architecture, and even os platform for such a “global” statement.
    while it may be true for some of the standard methods, it might not be always the case.
    for example on my old amd with mingw on windows, just adding “-fno-builtin” will give a small boost for “strlen”, but it will do the opposite for “memcpy”. so the performance table in 2.6 is essentially just an example of a cpu / lib perform / compiler closed case.


    [on MAC]
    “…Can only run on Mac platform.”

    ..not entirely (or legaly?) true. 🙂
    as proven by

  2. i really ought to re-read that article before replying, but…
    completely agree that a lot of other factors also play a role when optimizing, and that blanket statement like “this or that is the absolute, unquestionable, fastest way” is often valid only in very specific cases, rather than “the truth, the real truth, and nothing but the truth”.
    an error i’m seeing everywhere, is making these small test cases, and timing loops over, let’s say, a million iterations. what they’re often forgetting (or not knowing), is how the cpu cache lines work, the branch prediction, instruction pipelining and stalling, etc. it doesn’t matter if you save one cycle in a tight loop, if the pipeline stalls/flushes, or you have to wait, for example 200 cycles for a memory read/write (or something) each round of this loop..
    one thing you (meaning, the compiler) can often do, is to try to ‘hide’ this penalty, and do something else in parallell to this waiting, essentially 200 cycles for free…
    compilers can do marvellous things, but not magic. the only way to be sure, is to test, test, profile, compare…

    that hackintosh link made me curious 🙂
    perhaps i’ll take a more serious look into it one day..

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: