Re: [eigen] Vectorized quaternion multiplication.

[ Thread Index | Date Index | More Archives ]

When you put it in, please tell me. :)

I posted this for feedback here and you may be interested.

On Sat, Mar 7, 2009 at 5:16 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> hi,
> thanks a lot,
> at a first glance I was not sure about the perf, because it needs a
> lot of shuffle instructions which are quite costly, so benched, and on
> my core2 your version is 1.5 times faster :) Then I changed the
> shuffle_ps for the simpler PSHUFD instr. and now, it is almost 2x
> faster, so really worth it :)
> FYI I only changed vec4f_swizzle like this:
> #define vec4f_swizzle(v,p,q,r,s) (_mm_castsi128_ps(_mm_shuffle_epi32(
> _mm_castps_si128(v), \
>  ((s)<<6|(r)<<4|(q)<<2|(p)))))

Now I see. This instruction takes one operand alone so is perhaps
faster (aka higher throughput).

Rohit Garg

Senior Undergraduate
Department of Physics
Indian Institute of Technology

Mail converted by MHonArc 2.6.19+