|Re: [eigen] Optimization advice for a specific expression|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> writes:
> On 2016-02-05 17:50, Alberto Luaces wrote:
>> Thanks a lot, Christoph. This is very helpful. Regarding those
>> assertions: do you have any rule of thumb to know what is getting
>> vectorized and what is not?
> For SSE vectorization the matrix must be accessible in blocks of 16
> bytes which are aligned. I.e., the inner dimension must be even for
> doubles and a multiple of 4 for floats. It's a bit complicated for
> .block() and for dynamic-sized objects. Also, in some cases it is
> sufficient that only one side is aligned.
Yes, I should get rid of those block references: after all, I am
building them myself in the first place.
>> I get quickly swamped by all of those mul... and add... SSE instructions
>> in the assembler output, and cannot clearly see if they are just
>> performing scalar or vector operations. Is it maybe a matter of
>> checking that "packet" ops are mostly used ("P" prefix?)
> mulPd/addPd are packet (double precision) operations, mulSd/addSd are
> scalar operations.
> For float operations the instructions are called mulps/addps and
> mulss/addss. Div and sub are analogue, of course.
> Here is a nice reference over all x86 instructions:
> If you look at SSE-assembly a lot, you'll soon notice that you only
> need to remember 20ish of these to understand what's going on.
Thank you for the references, they will get me started!