Re: [eigen] Vectorization of complex |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
On 20.01.2011 23:11, David Luitz wrote:
> I then started testing the code and realized that unfortunately my
> implementation is a bit slower than the SSE2 version. Even more
> puzzling: Actually, the already existing SSE3 implementation is ALSO
> SLOWER than the SSE2 code! Does anybody have an idea, why my SSE4_1 code
> is even slower than the SSE3 code?
Just an uneducated guess:
Especially for older processors it could be that it only emulates SSE3
and SSE4_* instructions and is therefore slower (I had a similar thing
with an old AMD64 and SSE2 once). Though in more complex programs it
could be faster due to smaller code-size.
> By the way, we are talking about something like 1 percent run time
> difference in my tests, but still if the SSE3 and SSE4 codes are not
> really faster than SSE2, I think they should be removed...
At least this should be tested for different CPUs first ...
Maybe also make general suggestions such as: "Don't enable SSE3 for ..."
in the vectorization documentation.
Regards
Christoph
--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.051
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen
Tel: (+49) 421-218-64252
----------------------------------------------