Re: [eigen] Vectorized quaternion multiplication. |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Vectorized quaternion multiplication.
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Sat, 7 Mar 2009 15:13:38 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=ycAFIcZl1MKoUJkJ1ILN4sYZz2ZtyeGfUoH/USqmjMk=; b=hS6gzSCTnFLIfmt0lpIbDBtdvnpcvu+4jmEn3LZf/Dh6ovtwlpB/dWxuw0Tj6PIdXN Kee9M5aCkf2wXTWzlBmJ4a2N1KIigPOzaOsbadeB7KY8uWMECY9RJxaVOTADNJoxe7L4 w5mPFrrPTxoOC8F3hosxtELLAW5oa6QiFW7zg=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=gJvShnJbP/2cqixPVvk0dhLJoajEqxoub+GgxWenvjPno9+DagDsTPpl2U3+9Q3HC/ GH74UC9pWfgd01v0e+PsTek03pVMhNbTBwJYWxs2/MXxnHnODtfSQZVHLthdsrGSgLul lQr/vxXdhJPkjpQwOx+bhR2Ju43rW1ahv/Mm4=
perhaps this will work for you:
svn co svn://websvn.kde.org:443/home/kde/trunk/kdesupport/eigen2
gael
On Sat, Mar 7, 2009 at 3:01 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> On Sat, Mar 7, 2009 at 7:23 PM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxx> wrote:
>> committed :)
> Thanks a lot for that. :). The quat add and sub are vectorized now as
> well, I presume. {Damn the svn block. :( }
>
>>
>> On Sat, Mar 7, 2009 at 1:19 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>>> I think that the scalar version loses out because the multiplies are
>>> not pipelined there.
>>>
>>> On Sat, Mar 7, 2009 at 5:30 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>>>> When you put it in, please tell me. :)
>>>>
>>>> I posted this for feedback here and you may be interested.
>>>>
>>>> http://forum.beyond3d.com/showthread.php?t=52840
>>>>
>>>> On Sat, Mar 7, 2009 at 5:16 PM, Gael Guennebaud
>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>> hi,
>>>>>
>>>>> thanks a lot,
>>>>>
>>>>> at a first glance I was not sure about the perf, because it needs a
>>>>> lot of shuffle instructions which are quite costly, so benched, and on
>>>>> my core2 your version is 1.5 times faster :) Then I changed the
>>>>> shuffle_ps for the simpler PSHUFD instr. and now, it is almost 2x
>>>>> faster, so really worth it :)
>>>>>
>>>>> FYI I only changed vec4f_swizzle like this:
>>>>>
>>>>> #define vec4f_swizzle(v,p,q,r,s) (_mm_castsi128_ps(_mm_shuffle_epi32(
>>>>> _mm_castps_si128(v), \
>>>>> ((s)<<6|(r)<<4|(q)<<2|(p)))))
>>>>
>>>> Now I see. This instruction takes one operand alone so is perhaps
>>>> faster (aka higher throughput).
>>>>
>>>> --
>>>> Rohit Garg
>>>>
>>>> http://rpg-314.blogspot.com/
>>>>
>>>> Senior Undergraduate
>>>> Department of Physics
>>>> Indian Institute of Technology
>>>> Bombay
>>>>
>>>
>>>
>>>
>>> --
>>> Rohit Garg
>>>
>>> http://rpg-314.blogspot.com/
>>>
>>> Senior Undergraduate
>>> Department of Physics
>>> Indian Institute of Technology
>>> Bombay
>>>
>>>
>>>
>>
>>
>>
>
>
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
>
>
>