Re: [eigen] Vectorized quaternion multiplication. |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Vectorized quaternion multiplication.
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Sat, 7 Mar 2009 14:53:16 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=CeTvkVeEnh5yNwt4UT3QqRnYDWd+LG9gmKTiih+KP+k=; b=l+MsVauS/V7DLGamIsK1In4tdL/94oxoZIYcncni8PCQBGjwu8KV+yAk0HYMNrxW/B 7LD3z3Ebyb1rt7QuCTw/4N8d0WZR/8ErIiYc36HY5QWfXe9ztJyxOCANrmySbGtADhx/ Nhx8y9GwhbUchCQhTcjRP1eAeCejwQzOG3wYc=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=PNgCQrGpKKDeFLkjrZFN6WzANTBUz9+DX79C7AOFhlbzvTblMKpuAu2l2IFDzq9xeO SsCnszxwAErta8ZEUWY3qPwmQ+z3ZzAhL+CxOdLoH4YXk1LU7KqfpIZZ+fPD+v+Ga5yb X6zNGQrkGM9b7fSDDrFQHF2IwLoCX31jECGcw=
committed :)
On Sat, Mar 7, 2009 at 1:19 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> I think that the scalar version loses out because the multiplies are
> not pipelined there.
>
> On Sat, Mar 7, 2009 at 5:30 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>> When you put it in, please tell me. :)
>>
>> I posted this for feedback here and you may be interested.
>>
>> http://forum.beyond3d.com/showthread.php?t=52840
>>
>> On Sat, Mar 7, 2009 at 5:16 PM, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx> wrote:
>>> hi,
>>>
>>> thanks a lot,
>>>
>>> at a first glance I was not sure about the perf, because it needs a
>>> lot of shuffle instructions which are quite costly, so benched, and on
>>> my core2 your version is 1.5 times faster :) Then I changed the
>>> shuffle_ps for the simpler PSHUFD instr. and now, it is almost 2x
>>> faster, so really worth it :)
>>>
>>> FYI I only changed vec4f_swizzle like this:
>>>
>>> #define vec4f_swizzle(v,p,q,r,s) (_mm_castsi128_ps(_mm_shuffle_epi32(
>>> _mm_castps_si128(v), \
>>> ((s)<<6|(r)<<4|(q)<<2|(p)))))
>>
>> Now I see. This instruction takes one operand alone so is perhaps
>> faster (aka higher throughput).
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Senior Undergraduate
>> Department of Physics
>> Indian Institute of Technology
>> Bombay
>>
>
>
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
>
>
>