Re: [eigen] Blas performance on mapped matrices |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Blas performance on mapped matrices*From*: Sameer Agarwal <sameeragarwal@xxxxxxxxxx>*Date*: Mon, 9 Jan 2012 11:16:03 -0800*Cc*: Keir Mierle <keir@xxxxxxxxxx>*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:x-system-of-record:content-type:content-transfer-encoding; bh=JMT8PADVQ+ui37Txb39oPcuh5+8kY5VzWzYCOaTAuPU=; b=qrka2Mf+FUX2L+/my2JZINh2Zxqy3iipj5ZtbTfVOYPj5OHjJCPPblEVL1aq++G1Mt kKpAZZZ2IaCORr7Goeg2QrxyRj2Pt8qHVzswQfku3nAKUiTH/DiHzXpmowUA8KFi4+PV rDQmmTl4MTOj42iWtPMF+peKFgGTQjmwfNTic=

Gentlemen, Thank you for your quick and helpful replies. I will try to address all of them in one email rather than individual replies. @tim: I checked and there are no temporaries being created. @jitse, @gael and @benoit setting EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD to 16 increases performance significantly and we are now within hitting range of the eigen2 implementation. We are still about ~5% slower than the eigen2 implementation. But this is something we can live with. I am already using noalias and it does help. As for static sizing a I mentioned in my follow up email, the constructor syntax that I am looking for is block<rowsize, colsize>(r, c, rowsize1, colsize1) very much along the lines of Matrix<rowsize, colsize>(rowsize1, colsize1) where the template parameters override the dynamic parameters when its not set to Eigen::Dynamic. This allows me to do some template specialization optimizations. While I have your attention on the subject, Am I correct in assuming that A.rankupdate(B.transpose(), 1.0) assumes that there is no aliasing between A and B? and for small matrices is there something along the lines of EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD that I need to worry about? Along the same lines if I have a symmetric matrix A and matrices B and C whose product I know will be symmetric. Then what is the best way to do A.block(r,c, rsize, csize).selfadjointView<Eigen::Upper>().noalias() += B * C; because that does not work. I could just take the upper triangular part of the product A.block(r,c, rsize, csize).triangularView<Eigen::Upper>().noalias() += B * C; but that complains about there being no noalias() defined for triangular views. so what I end up doing is either A.block(r,c, rsize, csize).noalias += B*C; or A.block(r,c, rsize, csize).triangularView<Eigen::Upper>() += B * C; The slightly better performing solution at this stage seems to be A.block(r,c, rsize, csize).noalias += B*C; Sameer On Mon, Jan 9, 2012 at 8:09 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote: > I really cannot reproduce, on my system all the variants using Eigen3 > are faster than the best I can get out of Eigen2 (I used double). > > The result of my quick experiments also shown that; > > A.block<size1, size2>(r, c).noalias() -= B * C; > > is indeed the best you can do, and both the "noalias" and static sized > block are useful. It is in particular faster than: > > A.block<size1, size2>(r, c) -= B.lazyProduct(C); > > which uses an expression based product algorithm (tailored for very > small products). > > gael. > > ----------------------------------------- > > #include <iostream> > #include <Eigen/Dense> > #include <bench/BenchTimer.h> > using namespace Eigen; > > typedef double Scalar; > typedef Matrix<Scalar,Dynamic,Dynamic, RowMajor> Mat; > > EIGEN_DONT_INLINE void foo1(Scalar* dat1, Scalar* dat2, Mat& A, int i, int j) > { > Block<Mat,9,9>(A,i,j).noalias() -= (Map< Matrix<Scalar,9,3,RowMajor> >>(dat1) * Map< Matrix<Scalar,3,9,RowMajor> >(dat2)); > } > > int main (int argc, char** argv) > { > Matrix<Scalar,27,1> data1, data2; > data1.setRandom(); > data2.setRandom(); > > Mat A(100,100); > > BenchTimer t1; > int tries = 10; > int rep = 10000; > > > BENCH(t1, tries, rep, foo1(data1.data(), data2.data(), A, 2,3);); > std::cerr << t1.best() << "s\n"; > > > return (0); > } > > > On Mon, Jan 9, 2012 at 2:40 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote: >> 2012/1/9 Sameer Agarwal <sameeragarwal@xxxxxxxxxx>: >>> Hi Guys, >>> We are in the process of a significant code migration from eigen2 to >>> eigen3. The code uses Eigen::Map to map chunks of memory into RowMajor >>> matrices and operates on them. The primary operation is of the form >>> >>> A.block(r, c, size1, size2) -= B * C; >>> >>> A is a mapped matrix. >>> C is a mapped matrix. >>> B is an actual Eigen matrix. >>> >>> All matrices are RowMajor. For the example being considered, size1 = >>> size2 = 9. B is 9x3, and C is 3x9. >>> C and B are statically sized. >>> >>> Moving from eigen2 to eigen3 has resulting in a 30% performance >>> regression. Has something changed significantly in the way Eigen3 >>> handles mapped matrices, or about the structure of matrix-matrix >>> multiplication in Eigen3 that would cause this? >>> >>> The compiler flags are all the same between our use of eigen2 and >>> eigen3. Profiling indicates that much of the time is being spent >>> inside Eigen::internal::gebp_kernel::operator. >>> >>> I understand that this is not sufficient information to reproduce this >>> problem, so I am going to try and create a minimal case which can >>> reproduce this performance regression. In the meanwhile any insight >>> into this would be useful. Also is it possible to statically size >>> blocks like matrices? >> >> Yes, as explained on http://eigen.tuxfamily.org/dox/TutorialBlockOperations.html >> (also see Jitse's email, using that syntax). >> >> I agree with Jitse's suggestion of playing with .noalias() and with >> EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD, especially given your very >> special size where one of the two dimensions only is greater than the >> default threshold, it's very tempting to suspect that's the cause of >> your regression. >> Regarding noalias(), see this page: >> http://eigen.tuxfamily.org/dox/TopicWritingEfficientProductExpression.html >> Cheers, >> Benoit >> >>> >>> Thank you, >>> Sameer >>> >>> >> >> > >

**Follow-Ups**:**Re: [eigen] Blas performance on mapped matrices***From:*Gael Guennebaud

**References**:**[eigen] Blas performance on mapped matrices***From:*Sameer Agarwal

**Re: [eigen] Blas performance on mapped matrices***From:*Benoit Jacob

**Re: [eigen] Blas performance on mapped matrices***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Blas performance on mapped matrices** - Next by Date:
**Re: [eigen] Blas performance on mapped matrices** - Previous by thread:
**Re: [eigen] Blas performance on mapped matrices** - Next by thread:
**Re: [eigen] Blas performance on mapped matrices**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |