Re: [eigen] Re: 4x4 matrix inverse |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Re: 4x4 matrix inverse
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 15 Dec 2009 08:38:42 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=dhrcJKzRH0ycT7IZgr2DKNCg+df4w7za0a01kOTe22s=; b=CwP2yMLR1txxVbb7ja+GbXipmeUis5jOp1Uhepprp+3sgkncA91k1jvdMQ+O4A+8oo acSUMXPNUe9TmRCfkRp3bxg29BibV8o4DVqh08QLJubbSKIFeqnrDe+ip58wiyjcknUP pmFkhd7gPt67+UgA3fsRKYcJ2soH2UWKNRC3Y=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=O9Vjg92rBbchgGCE+lw5rqaBgPM4gLbLv2oZVdpje38ISDHN3m8IfGd4Gy7b45YFPN ofSuGVks80Br00R+SHhT787YKmslyqhkqX3VGnz+V63+GvBPfrPB9GghIf9o+kBnpEuU jHQJoqs1btq4UJoWcP5lQO3gZEYFH9f5fhGTo=
2009/12/15 mmoll <Markus.Moll@xxxxxxxxxxxxxxxx>:
> Hi
>
> Quoting Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> Actually, these lines were not equivalent to loads !
>>
>> When you look at this,
>>
>> tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)),
>> (__m64*)(src+ 4));
>>
>> The second half is loaded from src+4, not src+2.
>>
>> What is being loaded here is the top-left 2x2 corner of the matrix.
>
> Ah, I was wondering what the purpose was. But can't the same be achieved
> by a combination of
>
> 1. aligned loads of the matrix rows into say a, b, c, d (a=[a4,a3,a2,a1]
> and so on)
> 2. unpack quad word pairs: _mm_unpackhi_epi64(b, a) apparently yields
> [a4,a3,b4,b3] (upper left) and _mm_unpacklo_epi64(b, a) yields [a2, a1,
> b2, b1] (upper right)? (this is SSE2, though)
>
> I have no idea how the performance compares, though. (or whether it
> works at all)
You know this much better than me (honest), why don't you try it? If
it's faster, we'll use it. SSE2 is OK, we require it anyway for any
SSE code.
Benoit