*Subject*: Re: [eigen] Vectorization of complex*From*: Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>*Date*: Fri, 21 Jan 2011 12:18:16 +0100

On 20.01.2011 23:11, David Luitz wrote: > I then started testing the code and realized that unfortunately my > implementation is a bit slower than the SSE2 version. Even more > puzzling: Actually, the already existing SSE3 implementation is ALSO > SLOWER than the SSE2 code! Does anybody have an idea, why my SSE4_1 code > is even slower than the SSE3 code? Just an uneducated guess: Especially for older processors it could be that it only emulates SSE3 and SSE4_* instructions and is therefore slower (I had a similar thing with an old AMD64 and SSE2 once). Though in more complex programs it could be faster due to smaller code-size. > By the way, we are talking about something like 1 percent run time > difference in my tests, but still if the SSE3 and SSE4 codes are not > really faster than SSE2, I think they should be removed... At least this should be tested for different CPUs first ... Maybe also make general suggestions such as: "Don't enable SSE3 for ..." in the vectorization documentation. Regards Christoph -- ---------------------------------------------- Dipl.-Inf. Christoph Hertzberg Cartesium 0.051 Universität Bremen Enrique-Schmidt-Straße 5 28359 Bremen Tel: (+49) 421-218-64252 ----------------------------------------------

