| Re: [eigen] Optimization advice for a specific expression | 
[ Thread Index | 
Date Index
| More lists.tuxfamily.org/eigen Archives
] 
On 2016-02-05 17:50, Alberto Luaces wrote:
Thanks a lot, Christoph.  This is very helpful.  Regarding those
assertions: do you have any rule of thumb to know what is getting
vectorized and what is not?
For SSE vectorization the matrix must be accessible in blocks of 16 
bytes which are aligned. I.e., the inner dimension must be even for 
doubles and a multiple of 4 for floats. It's a bit complicated for 
..block() and for dynamic-sized objects. Also, in some cases it is 
sufficient that only one side is aligned.
I get quickly swamped by all of those mul... and add... SSE instructions
in the assembler output, and cannot clearly see if they are just
performing scalar or vector operations.  Is it maybe a matter of
checking that "packet" ops are mostly used ("P" prefix?)
mulPd/addPd are packet (double precision) operations, mulSd/addSd are 
scalar operations.
For float operations the instructions are called mulps/addps and 
mulss/addss. Div and sub are analogue, of course.
Here is a nice reference over all x86 instructions:
  http://www.felixcloutier.com/x86/
If you look at SSE-assembly a lot, you'll soon notice that you only need 
to remember 20ish of these to understand what's going on.
Christoph
--
 Dipl. Inf., Dipl. Math. Christoph Hertzberg
 Universität Bremen
 FB 3 - Mathematik und Informatik
 AG Robotik
 Robert-Hooke-Straße 1
 28359 Bremen, Germany
 Zentrale: +49 421 178 45-6611
 Besuchsadresse der Nebengeschäftsstelle:
 Robert-Hooke-Straße 5
 28359 Bremen, Germany
 Tel.:    +49 421 178 45-4021
 Empfang: +49 421 178 45-6600
 Fax:     +49 421 178 45-4150
 E-Mail:  chtz@xxxxxxxxxxxxxxxxxxxxxxxx
 Weitere Informationen: http://www.informatik.uni-bremen.de/robotik