Re: [eigen] 3.3-beta2 released!

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]




On Wed, Jul 27, 2016 at 11:42 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
It is quite surprising that unaligned vectorization is slowing down the execution, though. What's your CPU? What if you completely disable explicit vectorization? (-DEIGEN_DONT_VECTORIZE)


I found a possible explanation with a reproducible scenario.
For instance, let's consider the following set of matrix-vector products:

Matrix<float,5,5> a, b;
Matrix<float,5,1> r = a * b.col(0);
  for(int i=1;i<5;++i)
    r+=a*b.col(i);

for which I also observe a performance regression. The problem is that the _expression_: r+=a*b.col(i); is decomposed into two evaluations:

(1) tmp = a*b.col(i);
(2) r+=tmp;

In both cases, _expression_ (1) is not vectorized and the same evaluation path is used. However, with unaligned vectorization, Eigen vectorizes the second _expression_ thus enforcing the compiler to store the content of temp into memory instead of keeping them within registers. Adding .noalias():

r.noalias()+=a*b.col(i);

fixes the issue as this remove the temp, and no vectorization occurs at all. So tracking your matrix products for missing noalias() might help and improve the overall performance.

That being said, this product should be vectorized in the first place, thus making expressions (1) and (2) consistent. I'll investigate...

cheers,
gael
 

gael

On Wed, Jul 27, 2016 at 7:35 PM, <Daniel.Vollmer@xxxxxx> wrote:
Hello,

a small update: The slowdown from 3.2.9 to 3.3-beta2 in my case seems to be entirely down to the usage of unaligned vectorisation. If I turn that off with -DEIGEN_UNALIGNED_VECTORIZE=0, then 3.3 performs the same (or very, very slightly faster) as 3.2.9. Although compile times for 3.3 did increase noticeably (e.g. our code with 3.2.9 takes 1m52.533s to build, with 3.3 it takes 2m17.825s).


Best regards

Daniel Vollmer

--------------------------
Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
German Aerospace Center
Institute of Aerodynamics and Flow Technology | Lilienthalplatz 7 | 38108 Braunschweig | Germany

Daniel Vollmer | AS C²A²S²E
www.DLR.de

________________________________________
Von: Vollmer, Daniel
Gesendet: Mittwoch, 27. Juli 2016 17:48
An: eigen@xxxxxxxxxxxxxxxxxxx
Betreff: RE: [eigen] 3.3-beta2 released!

Hi,

thanks for everyone's efforts. The detailed changelog and release notes are very helpful.

I've tried out our code with Eigen 3.3-beta2 (and with some fixes to unsupported/AutoDiffScalar and some massaging around clang) it now compiles. :)

Using Eigen-3.3-beta2 versus 3.2.9 results in a slow-down of about 15% with g++-6.1 and a slow-down of about 10% using clang Apple LLVM version 7.3.0 (clang-703.0.31). This was compiling with -Ofast and -DNDEBUG.

We don't do anything fancy in our CFD code, mainly small, fixed size (e..g. 5x5 / 5x1) matrix and vector products,  occasionally hard-coding specific matrix decompositions, and a fair amount of direct element accesses (either single coeff, or row/ col / segment based).
Unfortunately, I find it quite difficult to extract helpful (or actionable) profiles to see what sort changes may be causing the differences for us. Our code (like many C++ codes) is quite sensitive to inlining decisions by the compiler.


Best regards

Daniel Vollmer

--------------------------
Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
German Aerospace Center
Institute of Aerodynamics and Flow Technology | Lilienthalplatz 7 | 38108 Braunschweig | Germany

Daniel Vollmer | AS C²A²S²E
www.DLR.de





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/