Re: [eigen] Blas performance on mapped matrices |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Blas performance on mapped matrices
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Mon, 9 Jan 2012 17:09:42 +0100
- Cc: Keir Mierle <keir@xxxxxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=J6R10igXSiFi4XtbL4ORJaH5Qk+/251VqEcM883fsjc=; b=mErmOMNwxjxjmuIO1H/CnN+Ct9VoiTjomaw7LRWQVcBxTFT5CauiCVvwvrCZTs15Dl 50ivwyc1yJ3rqtchgitY5i6WbesyMKqWPUuFOqTsLeLXx3OPcmx5XpgZ+ytVjcdbswuH G8FGKkRkCTFv2D5z3PC8fxEIkOEeBI9q3siZ0=
I really cannot reproduce, on my system all the variants using Eigen3
are faster than the best I can get out of Eigen2 (I used double).
The result of my quick experiments also shown that;
A.block<size1, size2>(r, c).noalias() -= B * C;
is indeed the best you can do, and both the "noalias" and static sized
block are useful. It is in particular faster than:
A.block<size1, size2>(r, c) -= B.lazyProduct(C);
which uses an expression based product algorithm (tailored for very
small products).
gael.
-----------------------------------------
#include <iostream>
#include <Eigen/Dense>
#include <bench/BenchTimer.h>
using namespace Eigen;
typedef double Scalar;
typedef Matrix<Scalar,Dynamic,Dynamic, RowMajor> Mat;
EIGEN_DONT_INLINE void foo1(Scalar* dat1, Scalar* dat2, Mat& A, int i, int j)
{
Block<Mat,9,9>(A,i,j).noalias() -= (Map< Matrix<Scalar,9,3,RowMajor>
>(dat1) * Map< Matrix<Scalar,3,9,RowMajor> >(dat2));
}
int main (int argc, char** argv)
{
Matrix<Scalar,27,1> data1, data2;
data1.setRandom();
data2.setRandom();
Mat A(100,100);
BenchTimer t1;
int tries = 10;
int rep = 10000;
BENCH(t1, tries, rep, foo1(data1.data(), data2.data(), A, 2,3););
std::cerr << t1.best() << "s\n";
return (0);
}
On Mon, Jan 9, 2012 at 2:40 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2012/1/9 Sameer Agarwal <sameeragarwal@xxxxxxxxxx>:
>> Hi Guys,
>> We are in the process of a significant code migration from eigen2 to
>> eigen3. The code uses Eigen::Map to map chunks of memory into RowMajor
>> matrices and operates on them. The primary operation is of the form
>>
>> A.block(r, c, size1, size2) -= B * C;
>>
>> A is a mapped matrix.
>> C is a mapped matrix.
>> B is an actual Eigen matrix.
>>
>> All matrices are RowMajor. For the example being considered, size1 =
>> size2 = 9. B is 9x3, and C is 3x9.
>> C and B are statically sized.
>>
>> Moving from eigen2 to eigen3 has resulting in a 30% performance
>> regression. Has something changed significantly in the way Eigen3
>> handles mapped matrices, or about the structure of matrix-matrix
>> multiplication in Eigen3 that would cause this?
>>
>> The compiler flags are all the same between our use of eigen2 and
>> eigen3. Profiling indicates that much of the time is being spent
>> inside Eigen::internal::gebp_kernel::operator.
>>
>> I understand that this is not sufficient information to reproduce this
>> problem, so I am going to try and create a minimal case which can
>> reproduce this performance regression. In the meanwhile any insight
>> into this would be useful. Also is it possible to statically size
>> blocks like matrices?
>
> Yes, as explained on http://eigen.tuxfamily.org/dox/TutorialBlockOperations.html
> (also see Jitse's email, using that syntax).
>
> I agree with Jitse's suggestion of playing with .noalias() and with
> EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD, especially given your very
> special size where one of the two dimensions only is greater than the
> default threshold, it's very tempting to suspect that's the cause of
> your regression.
> Regarding noalias(), see this page:
> http://eigen.tuxfamily.org/dox/TopicWritingEfficientProductExpression.html
> Cheers,
> Benoit
>
>>
>> Thank you,
>> Sameer
>>
>>
>
>