| Re: [eigen] Vectorized quaternion multiplication. | 
[ Thread Index | 
Date Index
| More lists.tuxfamily.org/eigen Archives
] 
- To: eigen@xxxxxxxxxxxxxxxxxxx
 
- Subject: Re: [eigen] Vectorized quaternion multiplication.
 
- From: Rohit Garg <rpg.314@xxxxxxxxx>
 
- Date: Sat, 7 Mar 2009 17:49:59 +0530
 
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed;         d=gmail.com; s=gamma;         h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type          :content-transfer-encoding; bh=iULdOF+BgQ/cdYyC6/wQ5XYS6+qVccmgk9GusPO3E3k=;         b=UcRFtWacvRsw4m8nurcf5nniF4AocLjn24DPpE9eaUsu53WsEOmvscLhFc6KA+0skw rZnGfv/O2jpJeyBDHEEsqAZGT/ZxwGsyesUvXYAvbNaYAlzoAjpkhfb01dLG5eHCGzEi       RGOTeXgPUJ9b/b4TQISowQEFeP1lqvpzvhrqM=
 
- Domainkey-signature: a=rsa-sha1; c=nofws;         d=gmail.com; s=gamma;    h=mime-version:in-reply-to:references:date:message-id:subject:from:to      :content-type:content-transfer-encoding;         b=S68FUI75wJUPuIK9GO6O6qF/YN2g6Pr0y1Gk0v0TVRDoRb+IFAc7I/S2bJKLCQe93m iordtgVopLQSBw8xukqAGI8OyfLEk3AdHNBsmLdAmMoNTSEyfI5I8cSQxt42HVeEp05R       i3yGEn6dxQYI76iMUZD7cua8p6PJilAST+0A0=
 
I think that the scalar version loses out because the multiplies are
not pipelined there.
On Sat, Mar 7, 2009 at 5:30 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> When you put it in, please tell me. :)
>
> I posted this for feedback here and you may be interested.
>
> http://forum.beyond3d.com/showthread.php?t=52840
>
> On Sat, Mar 7, 2009 at 5:16 PM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxx> wrote:
>> hi,
>>
>> thanks a lot,
>>
>> at a first glance I was not sure about the perf, because it needs a
>> lot of shuffle instructions which are quite costly, so benched, and on
>> my core2 your version is 1.5 times faster :) Then I changed the
>> shuffle_ps for the simpler PSHUFD instr. and now, it is almost 2x
>> faster, so really worth it :)
>>
>> FYI I only changed vec4f_swizzle like this:
>>
>> #define vec4f_swizzle(v,p,q,r,s) (_mm_castsi128_ps(_mm_shuffle_epi32(
>> _mm_castps_si128(v), \
>>  ((s)<<6|(r)<<4|(q)<<2|(p)))))
>
> Now I see. This instruction takes one operand alone so is perhaps
> faster (aka higher throughput).
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
>
-- 
Rohit Garg
http://rpg-314.blogspot.com/
Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay