[eigen] about .lazy()

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi all,

some of you already noticed that the current devel branch might look
broken because, e.g.:

D = C + (A*B).lazy();

does or does not compile according to the size of the matrices... I
know this is not a very nice situation, and my suggestion to solve
this mess is to remove the .lazy() function. Here are some arguments
against ".lazy()":

a - it is generic concept, but it only makes sense for product expressions

b - it is quite difficult to fully understand, and so it's difficult
to use it well

c - it covers two different features at once:

  c1 - it means that the result does not alias with the operands of
the product, but for that purpose it makes more sense to control that
via a special operator=, like res.noalias() = ...

  c2 - it also means that the product should not be evaluated
immediately, but evaluated as a standard expression. However, in
practice it is (almost) never a good idea to do so, and when it is not
the case the speed difference is negligible.

For large matrices, my last statement is obviously true. So if you
wonder what happens for small matrices, here is a benchmark for small
fixed and dynamic sizes matrices which evaluates D = C + A*B; using
three different strategies:

"Eval" : D = C + (A*B).eval();

"Lazy" : D = C + (A*B).lazy();   // here lazy means both "eval as an
expression" and "no-alias"

"Optimal": (D = C) += (A*B).lazy(); // here lazy is only used to means
"no-alias"

Here are the results with Eigen 2.0 (in MFlops, higher is better):

size    fix+e   fix+l   fix+o   dyn+e   dyn+l   dyn+o
2       1134    1501    1415    137     250     131
3       2442    1672    1469    283     401     267
4       5473    3495    5033    652     945     630
5       2359    1763    1567    586     697     580
6       1889    1765    1772    836     977     828
7       2110    1821    1643    815     881     792
8       3143    3286    3140    1247    1366    1213
9       2412    1827    1715    874     881     859
10      1931    1850    1832    1159    1198    1137
11      2451    1859    1792    1040    1035    1030
12      2876    3082    2943    1494    1431    1464
13      2453    1825    1759    1153    1130    1136
14      1903    1789    1813    1388    1380    1398
15      2422    1787    1717    1236    1226    1213
16      3055    3126    3077    3709    1574    4077
17      2319    1735    1710    2316    1258    2408


and with the devel branch:

size    fix+e   fix+l   fix+o   dyn+e   dyn+l   dyn+o
2       1073    0       1541    71      0       105
3       2457    0       1532    179     0       250
4       4849    0       4141    452     0       626
5       2149    0       1508    580     0       781
6       2423    0       1676    796     0       1031
7       1860    0       1778    919     0       1169
8       2283    0       2407    1708    0       2291
9       1938    0       2155    1819    0       2248
10      2247    0       2343    1923    0       2254
11      1775    0       2084    1888    0       2144
12      3341    0       3625    3047    0       3580
13      2718    0       3245    2827    0       3305
14      3202    0       3306    2885    0       3184
15      2672    0       3096    2794    0       2748
16      4870    0       5135    4342    0       4890
17      3560    0       4264    3914    0       4428

Let's recall that with the devel branch the "lazy" solution does not
compile, whence the zeros...

As you see, overall the devel branch is faster that is a good news
because we did not put any effort to optimize small matrices further.
The second remark is that the "lazy" solution performs poorly, even
for very small matrices. The reasons are two folds. First, evaluating
the product to a temporary will allow to vectorize the addition.
Second, for small fixed size objects temporaries are put on the stack,
and therefore they cost nothing.

So to summary, I'd be in favor in removing .lazy(), replace the
EvalBeforeAssignBit flag by a MightAliasBit flag, and add a no-alias
mechanism on the result side.


Finally, there is also the question whether operator+= and -= should
be "no-alias" by default ? This because I think that in 99% of the
case, when you write:

m += <product>

it's very unlikely that m is one of the operand of the product. The
drawback is that might be confusing for the user, (because operator=
and operator+= would behave differently wrt aliasing).


any opinions or better solutions ?

cheers,
Gael.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/