Re: [eigen] request for help: 4x4 matrix inverse

[ Thread Index | Date Index | More Archives ]

On Monday 14 April 2008, Benoît Jacob wrote:
> Hi List,

Hi all, this is my first msg to the list!

> I have started coding the LU module and committed already this:
> - optimized determinants for sizes <=4
>   (Thanks to Geoff for pointing me to the 30-mul technique for size
> 4). - general matrix inversion (good for sizes >= 5)
> - bruteforce matrix inversion using cofactors for sizes <=3
> - unrolled version of the general method for size 4

> Since the latter unrolled version is clumsy and is only used for
> size 4, since after all it seems that a bruteforce (cofactor)
> technique would be faster, and since size 4 is a common case that
> we want to optimize, I am asking:
> do you have good 4x4 matrix inversion code that you could
> contribute? Or can you point me to a good "algorithm" ? I put
> quotation marks because we know the algorithm is just brutefore,
> the question is how to implement it to minimize the number of ops
> and perhaps to take advantage of vectorization...
> Markos: if I remember correctly, you mentioned you had AltiVec code
> for that. When you mentioned it I was still hoping that the
> unrolled gaussian elimination would be good but now I realize that
> it's not, so I'm very much interested in your code if you want to
> contribute it :) Extra points if it also works with SSE.

Might I suggest choosing the matrix partitioning/matrix blocks/Liebniz 
formula for determinants, etc (don't remember the exact name, I got 
it from Numerical Recipes in C).

Anyway it goes like this:

           | a1 a2 a3 a4 |
Assume M = | b1 b2 b3 b4 |
           | c1 c2 c3 c4 |
           | d1 d2 d3 d4 |

We partition the matrix in 4 2x2 matrices:

M= | P Q |
   | R S |

where P = | a1 a2 |, Q = | a3 a4 |, R = | c1 c2 |, S = | c3 c4 |
          | b1 b2 |      | b3 b4 |      | d1 d2 |      | d3 d4 |

(actually the method is generic for any NxM matrix, but it becomes 
quite more complicated in these cases).
So, M's determinant is calculated thus:

detM = detP * det(S - R*P1*Q)

( P1 = P^(-1) )

(so if detP = 0, detM = 0 and M has no inverse, but detP is calculated 
in just one step!)

And M^(-1) is calculated thus:

M^(-1) = | P' Q' |
         | R' S' |


P' = P1 + (P1*Q) * (S - R*P1*Q)*(R*P1)
Q' = -(P1*Q)*(S - R*P1*Q)
R' = -(S - R*P1*Q)*(R*P1)
S' = (S - R*P1*Q)^(-1)

Notice the repeated expression (S - R*P1*Q). This is much easier to 
vectorize than LU decomposition or Gaussian or the coefficients 
methods. In fact I'm writing exactly this version for AltiVec right 
this moment for another project I'm contributing to (SIMDx86 math 
library, this is just for 3D stuff and is not intented to be as 
generic as Eigen). I'll post the code when I'm finished but it should 
not be hard for someone with good SSE experience to come up with this 

> I am also -- and most importantly -- looking for an optimized
> non-vectorized path!
> As you'll see in src/LU/Inverse.h, we have a template parameter
> bool CheckExistence. If true, we must carefully check that the
> inverse exists. Ideally, your implementation would do that in an
> optimized way when CheckExistence==true, and we would not pay any
> cost when
> CheckExistence==false.

With this method, CheckExistence should just check the detP and 
det(S - R*P1*Q), which should be quite fast to compute :)


> The API is:
> Matrix4d m;
> m.inverse(); // CheckExistence==true
> m.quickInverse(); // CheckExistence==false
> Inverse<Matrix4d> inverse_of_m(m);
>   // or, if you prefer, Inverse<Matrix4d> inverse_of_m =
> m.inverse(); if(inverse_of_m.exists())
> {
> 	cout << inverse_of_m << endl;
> }
> Cheers,
> Benoit

Mail converted by MHonArc 2.6.19+