Re: [eigen] unaligned or not unaligned vectorization ?

[ Thread Index | Date Index | More Archives ]

On Tuesday 08 July 2008 12:53:23 Gael Guennebaud wrote:
> removing unaligned stores only corresponds to the U/A cases which are
> still slower than without vectorization unless the "unaligned loads"
> are actually aligned (probabily = 0.25).

OK, now I understand, thanks.

> One option would be to add a 
> member function to all xpr checking at runtime all the arguments have
> the same alignment....

Come on that's not worth it (unless I misunderstood something again)!
Because in 3/4 of cases the alignment is not the same... it's not worth 
implementing a complex logic that is useless in 3/4 of cases!

> what do you think ?

The first thing that strikes me in the numbers in your previous e-mail is how 
even U/A i.e. unaligned loads only is already slower than no-vectorization. 
So we want to get rid not only of ei_pstoreu but also of ei_ploadu.

AFAIU, the consequence of that is that we give up on any idea to do first a 
few scalar ops until we reach aligned data, then proceed by packets. Unless, 
as you suggest, the offsets happen to be the same, but as you say this is 
only the case in 1/4 of cases. (Moreover, in an assignment like "a = b + c" 
the equality of offsets only happens in 1/16 of cases!!)

AFAIU, this means that we give up on slice vectorization and thus on 
vectorizing dynamic blocks.

The consolations are:
1) this makes source code simpler, e.g. the packet methods shed their template 
2) we can still vectorize certain fixed-size blocks, which would be useful 
e.g. for the 4x4 matrix inverse.



Attachment: signature.asc
Description: This is a digitally signed message part.

Mail converted by MHonArc 2.6.19+