|Re: [eigen] Blas performance on mapped matrices|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Blas performance on mapped matrices
- From: Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
- Date: Mon, 9 Jan 2012 11:16:03 -0800
- Cc: Keir Mierle <keir@xxxxxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:x-system-of-record:content-type:content-transfer-encoding; bh=JMT8PADVQ+ui37Txb39oPcuh5+8kY5VzWzYCOaTAuPU=; b=qrka2Mf+FUX2L+/my2JZINh2Zxqy3iipj5ZtbTfVOYPj5OHjJCPPblEVL1aq++G1Mt kKpAZZZ2IaCORr7Goeg2QrxyRj2Pt8qHVzswQfku3nAKUiTH/DiHzXpmowUA8KFi4+PV rDQmmTl4MTOj42iWtPMF+peKFgGTQjmwfNTic=
Thank you for your quick and helpful replies. I will try to address
all of them in one email rather than individual replies.
@tim: I checked and there are no temporaries being created.
@jitse, @gael and @benoit
setting EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD to 16 increases
performance significantly and we are now within hitting range of the
eigen2 implementation. We are still about ~5% slower than the eigen2
implementation. But this is something we can live with.
I am already using noalias and it does help. As for static sizing a I
mentioned in my follow up email, the constructor syntax that I am
looking for is
block<rowsize, colsize>(r, c, rowsize1, colsize1)
very much along the lines of Matrix<rowsize, colsize>(rowsize1, colsize1)
where the template parameters override the dynamic parameters when its
not set to Eigen::Dynamic. This allows me to do some template
While I have your attention on the subject, Am I correct in assuming that
assumes that there is no aliasing between A and B? and for small
matrices is there something along the lines of
EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD that I need to worry about?
Along the same lines if I have a symmetric matrix A and matrices B and
C whose product I know will be symmetric.
Then what is the best way to do
A.block(r,c, rsize, csize).selfadjointView<Eigen::Upper>().noalias() += B * C;
because that does not work.
I could just take the upper triangular part of the product
A.block(r,c, rsize, csize).triangularView<Eigen::Upper>().noalias() += B * C;
but that complains about there being no noalias() defined for triangular views.
so what I end up doing is either
A.block(r,c, rsize, csize).noalias += B*C;
A.block(r,c, rsize, csize).triangularView<Eigen::Upper>() += B * C;
The slightly better performing solution at this stage seems to be
A.block(r,c, rsize, csize).noalias += B*C;
On Mon, Jan 9, 2012 at 8:09 AM, Gael Guennebaud
> I really cannot reproduce, on my system all the variants using Eigen3
> are faster than the best I can get out of Eigen2 (I used double).
> The result of my quick experiments also shown that;
> A.block<size1, size2>(r, c).noalias() -= B * C;
> is indeed the best you can do, and both the "noalias" and static sized
> block are useful. It is in particular faster than:
> A.block<size1, size2>(r, c) -= B.lazyProduct(C);
> which uses an expression based product algorithm (tailored for very
> small products).
> #include <iostream>
> #include <Eigen/Dense>
> #include <bench/BenchTimer.h>
> using namespace Eigen;
> typedef double Scalar;
> typedef Matrix<Scalar,Dynamic,Dynamic, RowMajor> Mat;
> EIGEN_DONT_INLINE void foo1(Scalar* dat1, Scalar* dat2, Mat& A, int i, int j)
> Block<Mat,9,9>(A,i,j).noalias() -= (Map< Matrix<Scalar,9,3,RowMajor>
>>(dat1) * Map< Matrix<Scalar,3,9,RowMajor> >(dat2));
> int main (int argc, char** argv)
> Matrix<Scalar,27,1> data1, data2;
> Mat A(100,100);
> BenchTimer t1;
> int tries = 10;
> int rep = 10000;
> BENCH(t1, tries, rep, foo1(data1.data(), data2.data(), A, 2,3););
> std::cerr << t1.best() << "s\n";
> return (0);
> On Mon, Jan 9, 2012 at 2:40 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> 2012/1/9 Sameer Agarwal <sameeragarwal@xxxxxxxxxx>:
>>> Hi Guys,
>>> We are in the process of a significant code migration from eigen2 to
>>> eigen3. The code uses Eigen::Map to map chunks of memory into RowMajor
>>> matrices and operates on them. The primary operation is of the form
>>> A.block(r, c, size1, size2) -= B * C;
>>> A is a mapped matrix.
>>> C is a mapped matrix.
>>> B is an actual Eigen matrix.
>>> All matrices are RowMajor. For the example being considered, size1 =
>>> size2 = 9. B is 9x3, and C is 3x9.
>>> C and B are statically sized.
>>> Moving from eigen2 to eigen3 has resulting in a 30% performance
>>> regression. Has something changed significantly in the way Eigen3
>>> handles mapped matrices, or about the structure of matrix-matrix
>>> multiplication in Eigen3 that would cause this?
>>> The compiler flags are all the same between our use of eigen2 and
>>> eigen3. Profiling indicates that much of the time is being spent
>>> inside Eigen::internal::gebp_kernel::operator.
>>> I understand that this is not sufficient information to reproduce this
>>> problem, so I am going to try and create a minimal case which can
>>> reproduce this performance regression. In the meanwhile any insight
>>> into this would be useful. Also is it possible to statically size
>>> blocks like matrices?
>> Yes, as explained on http://eigen.tuxfamily.org/dox/TutorialBlockOperations.html
>> (also see Jitse's email, using that syntax).
>> I agree with Jitse's suggestion of playing with .noalias() and with
>> EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD, especially given your very
>> special size where one of the two dimensions only is greater than the
>> default threshold, it's very tempting to suspect that's the cause of
>> your regression.
>> Regarding noalias(), see this page:
>>> Thank you,