Re: [eigen] Optimization advice for a specific expression

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]




On 2016-02-05 17:50, Alberto Luaces wrote:
Thanks a lot, Christoph.  This is very helpful.  Regarding those
assertions: do you have any rule of thumb to know what is getting
vectorized and what is not?

For SSE vectorization the matrix must be accessible in blocks of 16 bytes which are aligned. I.e., the inner dimension must be even for doubles and a multiple of 4 for floats. It's a bit complicated for ..block() and for dynamic-sized objects. Also, in some cases it is sufficient that only one side is aligned.

I get quickly swamped by all of those mul... and add... SSE instructions
in the assembler output, and cannot clearly see if they are just
performing scalar or vector operations.  Is it maybe a matter of
checking that "packet" ops are mostly used ("P" prefix?)

mulPd/addPd are packet (double precision) operations, mulSd/addSd are scalar operations. For float operations the instructions are called mulps/addps and mulss/addss. Div and sub are analogue, of course.

Here is a nice reference over all x86 instructions:
  http://www.felixcloutier.com/x86/

If you look at SSE-assembly a lot, you'll soon notice that you only need to remember 20ish of these to understand what's going on.


Christoph




--
 Dipl. Inf., Dipl. Math. Christoph Hertzberg

 Universität Bremen
 FB 3 - Mathematik und Informatik
 AG Robotik
 Robert-Hooke-Straße 1
 28359 Bremen, Germany

 Zentrale: +49 421 178 45-6611

 Besuchsadresse der Nebengeschäftsstelle:
 Robert-Hooke-Straße 5
 28359 Bremen, Germany

 Tel.:    +49 421 178 45-4021
 Empfang: +49 421 178 45-6600
 Fax:     +49 421 178 45-4150
 E-Mail:  chtz@xxxxxxxxxxxxxxxxxxxxxxxx

 Weitere Informationen: http://www.informatik.uni-bremen.de/robotik



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/