Re: [eigen] Implementation of M.transpose()*M & M*M.transpose()

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Aron,

Are you saying that it's better to represent the data of a Matrix by 1D array or pointer instead of 2D array or ptr-to-ptr? If so, then every time one wants to fetch an element A(i,j) he has to do A[i*num_cols()+j]. Is this still more efficient than double-dereferencing of ptr-to-ptr?

Cheers,
Jinglei

----- Original Message ----- From: "Aron Ahmadia" <aja2111@xxxxxxxxxxxx>
To: <eigen@xxxxxxxxxxxxxxxxxxx>
Sent: 2010年2月21日 13:41
Subject: Re: [eigen] Implementation of M.transpose()*M & M*M.transpose()


Before doing experiments, I do not know whether it would be terribly slow.

Jinglei, for N > L2 cache size you will be hitting main memory for
every reference to v2d[i][j].  Furthermore, double-dereferencing of
pointers makes it almost impossible for the compiler to optimize your
code in any meaningful way.  I guarantee you your code example would
be terribly slow as Gael describes.  Writing efficient linear algebra
code is very hard.  There has been a lot of care and attention in
Eigen to details like this so that you don't have to worry about them.
As Gael mentioned earlier you can use the following code from the
developer's branch of Eigen to do what you want:

A.setZero()
A.selfadjointView<Upper>().rankUpdate(M);  // A += M.transpose()*M

and you can compute A * B as follow:

A.selfadjointView<Upper>() * B


Warm Regards,
Aron

On Sat, Feb 20, 2010 at 6:55 PM, Jinglei Hu <jingleihu@xxxxxxxxx> wrote:
Before doing experiments, I do not know whether it would be terribly slow.

I doubt there would be much difference, but I believe using different
compilers with different optimization options does lead to big difference
for the same code.

----- Original Message -----

From: Gael Guennebaud
To: eigen@xxxxxxxxxxxxxxxxxxx
Sent: 2010年2月20日 18:25
Subject: Re: [eigen] Implementation of M.transpose()*M & M*M.transpose()


On Sat, Feb 20, 2010 at 5:38 PM, Jinglei Hu <jingleihu@xxxxxxxxx> wrote:

Well, by trivial I mean one can write something as follows(not using
Eigen):

Matrix M_Mult_MT(const Matrix& Rhs) {
  // [m][n] * [n][m] --> [m][m]
  register int N(Rhs.num_rows());
  double v[N][N];
  double** v2d(Rhs);
  for (int i = 0; i < N; ++i) {
    double* vi = &(v[i][0]);
    double* pi = v2d[i];
    for (int j = i; j < N; ++j) {
      double sum = 0.0;
      double* pj = v2d[j];
      for (int k = 0; k < Rhs.num_cols(); ++k)
        sum += pi[k] * pj[k];
      vi[j] = v[j][i] = sum;
    }
  }
  return Matrix(N, N, &(v[0][0]));
}

yes but that would be terribly slow because here you do not take care of the
caches, vectorization, etc.

gael




----- Original Message -----
From: Gael Guennebaud
To: eigen@xxxxxxxxxxxxxxxxxxx
Sent: 2010年2月20日 16:27
Subject: Re: [eigen] Implementation of M.transpose()*M & M*M.transpose()


2010/2/20 Jinglei Hu <jingleihu@xxxxxxxxx>

Hi,

I'm curious how M.transpose()*M and M*M.transpose() is implemented in
Eigen by using Expression Template. Or it's just the same as the usual
multiplication of two matrices?

Currently A = M.transpose()*M involves a full matrix multiplication. If
you want to compute only an half, with the devel branch you can do:

A.setZero()
A.selfadjointView<Upper>().rankUpdate(M);  // A += M.transpose()*M

and you can compute A * B as follow:

A.selfadjointView<Upper>() * B

This is the preferred way if you only need to have one half (in most of
the case it is enough). If you really need to have the full result then
check you really really need it ;) Ok so if you really really need it then
do:
A.triangularView<Lower>() = A.transpose();

Actually, it is quite easy to detect that case in the matrix product, so
in the future you can expect that A = M.transpose()*M will be automatically
optimized as in Atlas for instance.


gael


The fact that the resultant matrix is symmetric can be used to save
computation cost especially for large matrix. For example, the Hessian
matrix, often used in nonlinear optimization/minimization, is actually
obtained as Jacobian.transpose() * Jacobian. It is trivial to write a
efficient function to do this by considering the symmetry.


not that trivial when you consider cache performance as we do... have a
look at the file in Eigen/src/Core/products/SelfadjointProduct.h.


But anyway it is less elegant in the sense that the code does not look
nice, e.g. Matrix hessian = MT_Mult_M(Jacobian).

Any idea to do this efficiently by sticking to Expression Template? A
different but somewhat similar case is the multiplication of normal matrix
by diagonal matrix.

this one is already optimized.



Cheers,
Jinglei Hu








Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/