Re: [eigen] Vectorization of complex

On 20.01.2011 23:11, David Luitz wrote:
> I then started testing the code and realized that unfortunately my
> implementation is a bit slower than the SSE2 version. Even more
> puzzling: Actually, the already existing SSE3 implementation is ALSO
> SLOWER than the SSE2 code! Does anybody have an idea, why my SSE4_1 code
> is even slower than the SSE3 code?

Just an uneducated guess:
Especially for older processors it could be that it only emulates SSE3
and SSE4_* instructions and is therefore slower (I had a similar thing
with an old AMD64 and SSE2 once). Though in more complex programs it
could be faster due to smaller code-size.

> By the way, we are talking about something like 1 percent run time
> difference in my tests, but still if the SSE3 and SSE4 codes are not
> really faster than SSE2, I think they should be removed...

At least this should be tested for different CPUs first ...
Maybe also make general suggestions such as: "Don't enable SSE3 for ..."
in the vectorization documentation.


