[eigen] Re: about .lazy()

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I forgot to say that to keep the compatibility with the current
solution, it is enough to update .lazy() to only remove the
"MayAliasBit" flag.

gael.

On Fri, Aug 14, 2009 at 11:36 PM, Gael
Guennebaud<gael.guennebaud@xxxxxxxxx> wrote:
> Hi all,
>
> some of you already noticed that the current devel branch might look
> broken because, e.g.:
>
> D = C + (A*B).lazy();
>
> does or does not compile according to the size of the matrices... I
> know this is not a very nice situation, and my suggestion to solve
> this mess is to remove the .lazy() function. Here are some arguments
> against ".lazy()":
>
> a - it is generic concept, but it only makes sense for product expressions
>
> b - it is quite difficult to fully understand, and so it's difficult
> to use it well
>
> c - it covers two different features at once:
>
>  c1 - it means that the result does not alias with the operands of
> the product, but for that purpose it makes more sense to control that
> via a special operator=, like res.noalias() = ...
>
>  c2 - it also means that the product should not be evaluated
> immediately, but evaluated as a standard expression. However, in
> practice it is (almost) never a good idea to do so, and when it is not
> the case the speed difference is negligible.
>
> For large matrices, my last statement is obviously true. So if you
> wonder what happens for small matrices, here is a benchmark for small
> fixed and dynamic sizes matrices which evaluates D = C + A*B; using
> three different strategies:
>
> "Eval" : D = C + (A*B).eval();
>
> "Lazy" : D = C + (A*B).lazy();   // here lazy means both "eval as an
> expression" and "no-alias"
>
> "Optimal": (D = C) += (A*B).lazy(); // here lazy is only used to means
> "no-alias"
>
> Here are the results with Eigen 2.0 (in MFlops, higher is better):
>
> size    fix+e   fix+l   fix+o   dyn+e   dyn+l   dyn+o
> 2       1134    1501    1415    137     250     131
> 3       2442    1672    1469    283     401     267
> 4       5473    3495    5033    652     945     630
> 5       2359    1763    1567    586     697     580
> 6       1889    1765    1772    836     977     828
> 7       2110    1821    1643    815     881     792
> 8       3143    3286    3140    1247    1366    1213
> 9       2412    1827    1715    874     881     859
> 10      1931    1850    1832    1159    1198    1137
> 11      2451    1859    1792    1040    1035    1030
> 12      2876    3082    2943    1494    1431    1464
> 13      2453    1825    1759    1153    1130    1136
> 14      1903    1789    1813    1388    1380    1398
> 15      2422    1787    1717    1236    1226    1213
> 16      3055    3126    3077    3709    1574    4077
> 17      2319    1735    1710    2316    1258    2408
>
>
> and with the devel branch:
>
> size    fix+e   fix+l   fix+o   dyn+e   dyn+l   dyn+o
> 2       1073    0       1541    71      0       105
> 3       2457    0       1532    179     0       250
> 4       4849    0       4141    452     0       626
> 5       2149    0       1508    580     0       781
> 6       2423    0       1676    796     0       1031
> 7       1860    0       1778    919     0       1169
> 8       2283    0       2407    1708    0       2291
> 9       1938    0       2155    1819    0       2248
> 10      2247    0       2343    1923    0       2254
> 11      1775    0       2084    1888    0       2144
> 12      3341    0       3625    3047    0       3580
> 13      2718    0       3245    2827    0       3305
> 14      3202    0       3306    2885    0       3184
> 15      2672    0       3096    2794    0       2748
> 16      4870    0       5135    4342    0       4890
> 17      3560    0       4264    3914    0       4428
>
> Let's recall that with the devel branch the "lazy" solution does not
> compile, whence the zeros...
>
> As you see, overall the devel branch is faster that is a good news
> because we did not put any effort to optimize small matrices further.
> The second remark is that the "lazy" solution performs poorly, even
> for very small matrices. The reasons are two folds. First, evaluating
> the product to a temporary will allow to vectorize the addition.
> Second, for small fixed size objects temporaries are put on the stack,
> and therefore they cost nothing.
>
> So to summary, I'd be in favor in removing .lazy(), replace the
> EvalBeforeAssignBit flag by a MightAliasBit flag, and add a no-alias
> mechanism on the result side.
>
>
> Finally, there is also the question whether operator+= and -= should
> be "no-alias" by default ? This because I think that in 99% of the
> case, when you write:
>
> m += <product>
>
> it's very unlikely that m is one of the operand of the product. The
> drawback is that might be confusing for the user, (because operator=
> and operator+= would behave differently wrt aliasing).
>
>
> any opinions or better solutions ?
>
> cheers,
> Gael.
>



-- 
Gaël Guennebaud
Iparla - INRIA Bordeaux
(+33)5 40 00 37 95



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/