Re: [eigen] again msvc inlining...

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Turns out the poor GCC performance was due to a single call to std::pow amongst thousands of eigen-using floating point operations; when the base is almost 1, it seems that glibc's pow on x64 is very slow.  That single call was using almost half the execution time (and having a profiler would have made finding that much easier).

In any case, after fixing that, gcc's output is indeed slightly faster than MSC's, corresponding to the microbenchmarks.  On a slightly larger test-set than before, the new timings are then:

Before inlining:
LvqBench3 on MSC: 1.81s; 130KB
LvqBench3v on MSC: 1.04s; 136KB
LvqBench3 on GCC: 1.22s; 895KB
LvqBench3v on GCC: 1.02s; 939KB

After extra inlining:
LvqBench3 on MSC: 1.29s; 137KB
LvqBench3v on MSC: 0.964s; 150KB
LvqBench3 on GCC: 1.09s; 915KB
LvqBench3v on GCC: 0.926s; 957KB



--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163


On Fri, Mar 19, 2010 at 11:51, Eamon Nerbonne <eamon.nerbonne@xxxxxxxxx> wrote:
After some poking with mq and the uses of hg qfold, I think I have a working patch ;-)...



--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163


On Fri, Mar 19, 2010 at 10:04, Eamon Nerbonne <eamon.nerbonne@xxxxxxxxx> wrote:
GCC: 4.4.3 (the 64-bit build at equation.com)
MSC: VS.NET 2010 RC

I'll gladly send a patch, once I've figured out how to generate a clean one; I'm still learning the hg ropes (right now it's a mess of lots of revisions+merges).


--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163


On Fri, Mar 19, 2010 at 09:32, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx> wrote:
Hi Eamon,

these findings sound interesting. Can you show me a patch containing
you changes such that I can get at least an idea of which parts affect
your run-times!?

I just want to clarify once more - the methods I changed at the moment
are no low-level function calls. They include only those functions
that every now and then do return heap objects which prevents inlining
even in the case where you explicitly ask for forced inlines on MSVC.

A final question - which compiler version are you using?

- Hauke

On Thu, Mar 18, 2010 at 3:27 PM, Eamon Nerbonne
> OK, I played around with inlining on a "real" program (it's an
> implementation of learning vector quantization).
>
> These are the results:
>
> Before any updates:
> LvqBench2 on GCC: 1.93s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.88s; 774KB
> LvqBench3v on GCC: 1.72s; 779KB
> LvqBench2 on MSC: 2.02s; 124KB
> LvqBench2v on MSC: 1.24s; 129KB
> LvqBench3 on MSC: 1.64s; 131KB
> LvqBench3v on MSC: 0.993s; 138KB
>
> Post-patch; EIGEN_MORE_INLINE off
> LvqBench2 on GCC: 1.93s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.89s; 777KB
> LvqBench3v on GCC: 1.7s; 778KB
> LvqBench2 on MSC: 2.02s; 124KB
> LvqBench2v on MSC: 1.24s; 129KB
> LvqBench3 on MSC: 1.63s; 131KB
> LvqBench3v on MSC: 0.988s; 141KB
>
> Post-patch; EIGEN_MORE_INLINE on
> LvqBench2 on GCC: 1.92s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.72s; 777KB
> LvqBench3v on GCC: 1.55s; 782KB
> LvqBench2 on MSC: 2.01s; 124KB
> LvqBench2v on MSC: 1.25s; 129KB
> LvqBench3 on MSC: 1.16s; 138KB
> LvqBench3v on MSC: 0.937s; 151KB
>
> The 2/2v/3/3v suffix corresponds to the version of eigen and whether
> vectorization is on.  The timings are best of 10 runs.  Although the eigen2
> variants weren't changed, I left in their timings to give a feel for the
> variance of the timings.
>
> The update removed a few EIGEN_DONT_INLINE's and added a few inlines.  When
> EIGEN_MORE_INLINE is on, those extra inlines are instead strong inlines, and
> strong inlines also get the EIGEN_ALWAYS_INLINE_ATTRIB on gcc.
>
> Most interesting are the 3v timings:
> GCC 1.72s; 779KB changes to 1.55s; 782KB.
> MSC 0.993s; 138KB changes to 0.937s; 151KB
>
> which, for this application anyhow, is an obvious improvement.
>
> With that said - not all strong inlines are useful; I initially just added
> eigen_strong_inline everywhere and that's causes excessive compile times and
> larger executables.  So, on the second attempt I tried to add inlines where
> functions were otherwise cheap, particularly when a call to an
> eigen-function was implemented with another eigen function (i.e. where the
> eigen-internal call stack was more than 1 deep), or where several versions
> of an algorithm differed merely by template arguments and a few versions
> already had EIGEN_STRONG_INLINE.
>
> Also noteworthy is the relatively poor GCC performance; I'm not sure what's
> going on there.  Most of my micro-benchmarks end up with GCC in a very solid
> lead, but here it's slower.  I tried using gprof (which is available under
> windows), but the resultant executable immediate crashed with bad_alloc.
>
> --eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163







Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/