Re: [eigen] Issues regarding Quaternion-alignment and const Maps |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Benoit Jacob wrote:
2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:Benoit Jacob wrote:Wow, very good work. I indeed confirm the 2x speed improvement, and once i moved your benchmarking code to a non-inlinable function called from main(), it even got a bit higher (indeed GCC fails to optimize correctly code in the main() function). Could you make a patch against the development branch? (We're not going to add features to 2.0 at this point).I think I can do that, but most likely not before Monday/Tuesday.http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Generating_a_patch Also, I didn't know about that loaddup instruction in SSE3. It's great! I'll have a look at using it in ei_pset1 when SSE3 is available.It's actually a pity that there is no complete list with *just* all SSE-instructions (not mixed with every other x86-instruction), including a short description, maybe a usage example, and intrinsics for some common compilers. At least I did't find any ...Yes, I've been trying to see if there is a single-precision equivalent for MOVDDUP and I still don't know...
I just searched every <*mmintrin.h> for float and found in <xmmintrin.h>: /* Create a vector with all four elements equal to *P. */extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_load1_ps (float const *__P) { return _mm_set1_ps (*__P); }but looking at _mm_set1_ps, it doesn't really look like this is actually an SSE instruction ...
-- ---------------------------------------------- Dipl.-Inf. Christoph Hertzberg Cartesium 0.051 Universität Bremen Enrique-Schmidt-Straße 5 28359 Bremen Tel: (+49) 421-218-64252 ----------------------------------------------
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |