|[eigen] Performance tuning|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
I'm writing to ask about people's favorite techniques for tuning the
performance of their code, and in particular whether there exists a HOWTO (or
a desire to write one).
Here's one example: for the kinds of problems I'm studying, I've found that
one of the most helpful steps is to avoid the creation of temporaries. Because
temporaries are usually created "silently," I am sometimes surprised by which
lines are triggering temporaries. Once I discovered that this was an issue,
here's how I went about solving the problem at first:
1. Compile the program with debugging symbols and optimization
2. Profile my code in valgrind
3. See how many times posix_memalign gets called.
4. If it's too many, comment out a suspicious-looking block of code.
5. Go back to step 1 until the problem is localized, then fix it.
But after growing tired of this process, I finally realizing that a better way
to discover the problematic line(s) is to set a breakpoint in
Core/Util/Memory.h::aligned_realloc. And, of course, maybe there's even a
better way I haven't yet stumbled across, like setting some compiler flag to
throw an exception upon the creation of a temporary?
But there are other issues I don't know how to solve. For example, in
valgrind, I frequently wish that the cost of inlined code could be "folded in"
to the cost computed for the individual lines of my source code. Does anyone
know of a way of doing that? Or is there a different tool that can do this?
A natural gathering place for useful tips might be
but currently that section is fairly short.