Re: [eigen] Re: 4x4 matrix inverse |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Re: 4x4 matrix inverse*From*: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>*Date*: Tue, 15 Dec 2009 07:45:26 -0500*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=ze99N2Cz7K7xzQnKIzgfPVPuSHTCWKkRU4Qunw9y+wM=; b=fhXMP+sF06G+5avUasYgbMYJNQZ6L7NRCBsFfkeXG1+2ON1v/T+MyFUJxmJW8HPlaT J4uIiH7rL2gTTAoNtEiewLeZk512UGVytFbAqMSP5THThSWIy9l02mxBNcp0HsDY4zqs nKcwZS8skAG5niBzZqC6hu1+JKo05sBPahpbY=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ZQxdWmUaA8eR1aT/pPtQEs5FeQdfcDZ2++2Tm1sZWkpdv3aBCEQKESlwP+A+02sNRT KhwF5apfD/8+AoG+VK9x0uwMPG8qY0gNl9XnZ0wG+Cr0Vr4TxIQCs5sh+5dK0/pTdeRF k0fAHm8jj+lEwNIyw2CpsUZ6Rc1MNR5e52dJs=

2009/12/15 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>: > > > On Tue, Dec 15, 2009 at 12:52 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> > wrote: >> >> 2009/12/15 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>: >> > >> > >> > On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> >> > wrote: >> >> >> >> Hi, >> >> >> >> To summarize recent commits: all this is now done in the development >> >> branch, it only remains to consider backporting. >> >> >> >> The SSE code is 4.5x faster than my plain scalar path! I guess that's >> >> explained not only by SSE intrinsics but also by better ordering of >> >> instructions... >> >> >> >> There is one thing where I didn't follow Intel's code: they use a >> >> RCPSS instruction to compute 1/det approximately, then followed by a >> >> Newton-Raphson iteration. This sacrifices up to 2 bits of precision in >> >> the mantissa, which already is a bit nontrivial for us (4x4 matrix >> >> inversion is a basic operation on which people will rely very >> >> heavily). To help solve that dilemma (performance vs precision) I >> >> benchmarked it, and it turns out than on my core i7, DIVSS is >> >> slightly faster !! Intel's paper was written for the pentium 3 so >> >> that's perhaps not surprising, but I saw forum posts mentioning that >> >> the RCPSS trick is still faster on the Core2. If you want to test, see >> >> lines 128-130 in Inverse_SSE.h. >> >> >> >> I have a question. I currently get warnings in this code (taken >> >> straight from Intel): >> >> >> >> __m128 tmp1; >> >> tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), >> >> (__m64*)(src+ >> >> 4)); >> > >> > why don't you use tmp1 = ei_pload(src); since we know src will be >> > aligned ? >> >> You're right, I didn't look carefully at this line from Intel, and >> just below, there are these lines: >> >> tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src+ 2)), >> (__m64*)(src+ 6)); >> row3 = _mm_loadh_pi(_mm_loadl_pi(row3, (__m64*)(src+10)), >> (__m64*)(src+14)); >> >> which cannot use a ei_pload since they use non-multiple-of-16-bytes >> offsets, and I was confused from there. >> >> By the way, is this trick (to load from 64-bit aligned addresses) >> worth abstracting into a "ei_pload8" function? It's probably faster >> than completely unaligned loads... > > this is more or less what we do in ei_ploadu, but using a pair of > movsd/movhps and inline assembly to avoid GCC messing up. But currently for > MSVC we use raw unaligned loads, so we probably should switch to a pair of > intrinsics. > > I chose a pair of movsd/movhps because it appeared to be fastest option for > my CPU. > > Finally, to be clear I think you can change these 2 lines with ei_ploadu and > the perf. should be the same. Actually, these lines were not equivalent to loads ! When you look at this, tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), (__m64*)(src+ 4)); The second half is loaded from src+4, not src+2. What is being loaded here is the top-left 2x2 corner of the matrix. Benoit

**Follow-Ups**:**Re: [eigen] Re: 4x4 matrix inverse***From:*mmoll

**References**:**[eigen] 4x4 matrix inverse***From:*Benoit Jacob

**[eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**Re: [eigen] Re: 4x4 matrix inverse***From:*Gael Guennebaud

**Re: [eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**Re: [eigen] Re: 4x4 matrix inverse***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Re: 4x4 matrix inverse** - Next by Date:
**Re: [eigen] Re: 4x4 matrix inverse** - Previous by thread:
**Re: [eigen] Re: 4x4 matrix inverse** - Next by thread:
**Re: [eigen] Re: 4x4 matrix inverse**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |