[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: [eigen] about .lazy()
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Fri, 14 Aug 2009 23:36:29 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=7xAJukA6If7NjV3ZyLRbumS33l42tqXVAYvqPNVKuFU=; b=G+LXaHnn+rbNq7nmpdelPBECgwkp/Ky/S55Ul5jtjxeVPbUwFh0hBaEIx13mM8QWlZ 0TGOtkOG2KqFjeuKP04IFgNcHx8H2nq36VMd8yVbaVYdXIMwk+GYCW7gJZniw767/Lgk ofxUD+p+Ykdo6NerYXbNM2fVCI4On79F4Gfbo=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=KyVKXoE2Dph8a4ZweWFWroLX2Q0i7nrkPKKBgWsn/sDV3TFZ3nJcw9mb/eZEm9Yc4m eDR3zUEUny5/mf1awVmTV9AOEAiaY1e6x3X5UXIxJyZFRs4MJQE3QPTj2chGxpjpk4Im HcpKbGOQPHj22FPzqQJvWTa4Cg5ShfCPqr+zY=
Hi all,
some of you already noticed that the current devel branch might look
broken because, e.g.:
D = C + (A*B).lazy();
does or does not compile according to the size of the matrices... I
know this is not a very nice situation, and my suggestion to solve
this mess is to remove the .lazy() function. Here are some arguments
against ".lazy()":
a - it is generic concept, but it only makes sense for product expressions
b - it is quite difficult to fully understand, and so it's difficult
to use it well
c - it covers two different features at once:
c1 - it means that the result does not alias with the operands of
the product, but for that purpose it makes more sense to control that
via a special operator=, like res.noalias() = ...
c2 - it also means that the product should not be evaluated
immediately, but evaluated as a standard expression. However, in
practice it is (almost) never a good idea to do so, and when it is not
the case the speed difference is negligible.
For large matrices, my last statement is obviously true. So if you
wonder what happens for small matrices, here is a benchmark for small
fixed and dynamic sizes matrices which evaluates D = C + A*B; using
three different strategies:
"Eval" : D = C + (A*B).eval();
"Lazy" : D = C + (A*B).lazy(); // here lazy means both "eval as an
expression" and "no-alias"
"Optimal": (D = C) += (A*B).lazy(); // here lazy is only used to means
"no-alias"
Here are the results with Eigen 2.0 (in MFlops, higher is better):
size fix+e fix+l fix+o dyn+e dyn+l dyn+o
2 1134 1501 1415 137 250 131
3 2442 1672 1469 283 401 267
4 5473 3495 5033 652 945 630
5 2359 1763 1567 586 697 580
6 1889 1765 1772 836 977 828
7 2110 1821 1643 815 881 792
8 3143 3286 3140 1247 1366 1213
9 2412 1827 1715 874 881 859
10 1931 1850 1832 1159 1198 1137
11 2451 1859 1792 1040 1035 1030
12 2876 3082 2943 1494 1431 1464
13 2453 1825 1759 1153 1130 1136
14 1903 1789 1813 1388 1380 1398
15 2422 1787 1717 1236 1226 1213
16 3055 3126 3077 3709 1574 4077
17 2319 1735 1710 2316 1258 2408
and with the devel branch:
size fix+e fix+l fix+o dyn+e dyn+l dyn+o
2 1073 0 1541 71 0 105
3 2457 0 1532 179 0 250
4 4849 0 4141 452 0 626
5 2149 0 1508 580 0 781
6 2423 0 1676 796 0 1031
7 1860 0 1778 919 0 1169
8 2283 0 2407 1708 0 2291
9 1938 0 2155 1819 0 2248
10 2247 0 2343 1923 0 2254
11 1775 0 2084 1888 0 2144
12 3341 0 3625 3047 0 3580
13 2718 0 3245 2827 0 3305
14 3202 0 3306 2885 0 3184
15 2672 0 3096 2794 0 2748
16 4870 0 5135 4342 0 4890
17 3560 0 4264 3914 0 4428
Let's recall that with the devel branch the "lazy" solution does not
compile, whence the zeros...
As you see, overall the devel branch is faster that is a good news
because we did not put any effort to optimize small matrices further.
The second remark is that the "lazy" solution performs poorly, even
for very small matrices. The reasons are two folds. First, evaluating
the product to a temporary will allow to vectorize the addition.
Second, for small fixed size objects temporaries are put on the stack,
and therefore they cost nothing.
So to summary, I'd be in favor in removing .lazy(), replace the
EvalBeforeAssignBit flag by a MightAliasBit flag, and add a no-alias
mechanism on the result side.
Finally, there is also the question whether operator+= and -= should
be "no-alias" by default ? This because I think that in 99% of the
case, when you write:
m += <product>
it's very unlikely that m is one of the operand of the product. The
drawback is that might be confusing for the user, (because operator=
and operator+= would behave differently wrt aliasing).
any opinions or better solutions ?
cheers,
Gael.