Re: [eigen] benchmarking weirdness

[ Thread Index | Date Index | More Archives ]

Hash: SHA256

Gael Guennebaud schrieb:
> Also, seeing the benchmark code, I don't think that any cache miss
> occurs since you only have two matrices.

"Think" is a very bad guide when it comes to performance optimization.
Onlöy real measurements can count as it's far too easy to make things
worse by over optimizing (especially true when it comes to manual loop

You must not only take care about cache misses for the data but also
about cache misses for the instructions (that's where loop unrolling can
really bite you).

You also must have a look at register useage which can be thought as a
"level 0" cache. Especially changing between row and column major can
make a huge difference here.

And at the end (IIRC VTune can also tell you that) a huge performance
difference can be achieved by optimization of the branch prediction.

> I would also suggest to bench with different compilers, the results
> might be very different. However, eigein2 is currently not compatible
> with ICC.

That's sad and we should fix it ASAP. ICC is a very good compiler when
it comes to optimal performance. It's also quite good at auto
vectorisation which is crucial for SSE useage (unless you are doing it
by hand with intrinsics)


Version: GnuPG v1.4.6 (GNU/Linux)


Mail converted by MHonArc 2.6.19+