Re: [eigen] Optimization advice for a specific expression |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
On 2016-02-05 17:50, Alberto Luaces wrote:
Thanks a lot, Christoph. This is very helpful. Regarding those
assertions: do you have any rule of thumb to know what is getting
vectorized and what is not?
For SSE vectorization the matrix must be accessible in blocks of 16
bytes which are aligned. I.e., the inner dimension must be even for
doubles and a multiple of 4 for floats. It's a bit complicated for
..block() and for dynamic-sized objects. Also, in some cases it is
sufficient that only one side is aligned.
I get quickly swamped by all of those mul... and add... SSE instructions
in the assembler output, and cannot clearly see if they are just
performing scalar or vector operations. Is it maybe a matter of
checking that "packet" ops are mostly used ("P" prefix?)
mulPd/addPd are packet (double precision) operations, mulSd/addSd are
scalar operations.
For float operations the instructions are called mulps/addps and
mulss/addss. Div and sub are analogue, of course.
Here is a nice reference over all x86 instructions:
http://www.felixcloutier.com/x86/
If you look at SSE-assembly a lot, you'll soon notice that you only need
to remember 20ish of these to understand what's going on.
Christoph
--
Dipl. Inf., Dipl. Math. Christoph Hertzberg
Universität Bremen
FB 3 - Mathematik und Informatik
AG Robotik
Robert-Hooke-Straße 1
28359 Bremen, Germany
Zentrale: +49 421 178 45-6611
Besuchsadresse der Nebengeschäftsstelle:
Robert-Hooke-Straße 5
28359 Bremen, Germany
Tel.: +49 421 178 45-4021
Empfang: +49 421 178 45-6600
Fax: +49 421 178 45-4150
E-Mail: chtz@xxxxxxxxxxxxxxxxxxxxxxxx
Weitere Informationen: http://www.informatik.uni-bremen.de/robotik