Re: [eigen] again msvc inlining...

On Fri, Mar 19, 2010 at 11:51, Eamon Nerbonne <eamon.nerbonne@xxxxxxxxx> wrote:

After some poking with mq and the uses of hg qfold, I think I have a working patch ;-)...

--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163

On Fri, Mar 19, 2010 at 10:04, Eamon Nerbonne <eamon.nerbonne@xxxxxxxxx> wrote:

GCC: 4.4.3 (the 64-bit build at equation.com)
MSC: VS.NET 2010 RC

I'll gladly send a patch, once I've figured out how to generate a clean one; I'm still learning the hg ropes (right now it's a mess of lots of revisions+merges).

--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163

On Fri, Mar 19, 2010 at 09:32, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx> wrote:

Hi Eamon,

these findings sound interesting. Can you show me a patch containing
you changes such that I can get at least an idea of which parts affect
your run-times!?

I just want to clarify once more - the methods I changed at the moment
are no low-level function calls. They include only those functions
that every now and then do return heap objects which prevents inlining
even in the case where you explicitly ask for forced inlines on MSVC.

A final question - which compiler version are you using?

- Hauke

On Thu, Mar 18, 2010 at 3:27 PM, Eamon Nerbonne

<eamon.nerbonne@xxxxxxxxx> wrote:

> OK, I played around with inlining on a "real" program (it's an
> implementation of learning vector quantization).
>
> These are the results:
>
> Before any updates:
> LvqBench2 on GCC: 1.93s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.88s; 774KB
> LvqBench3v on GCC: 1.72s; 779KB
> LvqBench2 on MSC: 2.02s; 124KB
> LvqBench2v on MSC: 1.24s; 129KB
> LvqBench3 on MSC: 1.64s; 131KB
> LvqBench3v on MSC: 0.993s; 138KB
>
> Post-patch; EIGEN_MORE_INLINE off
> LvqBench2 on GCC: 1.93s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.89s; 777KB
> LvqBench3v on GCC: 1.7s; 778KB
> LvqBench2 on MSC: 2.02s; 124KB
> LvqBench2v on MSC: 1.24s; 129KB
> LvqBench3 on MSC: 1.63s; 131KB
> LvqBench3v on MSC: 0.988s; 141KB
>
> Post-patch; EIGEN_MORE_INLINE on
> LvqBench2 on GCC: 1.92s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.72s; 777KB
> LvqBench3v on GCC: 1.55s; 782KB
> LvqBench2 on MSC: 2.01s; 124KB
> LvqBench2v on MSC: 1.25s; 129KB
> LvqBench3 on MSC: 1.16s; 138KB
> LvqBench3v on MSC: 0.937s; 151KB
>
> The 2/2v/3/3v suffix corresponds to the version of eigen and whether
> vectorization is on. The timings are best of 10 runs. Although the eigen2
> variants weren't changed, I left in their timings to give a feel for the
> variance of the timings.
>
> The update removed a few EIGEN_DONT_INLINE's and added a few inlines. When
> EIGEN_MORE_INLINE is on, those extra inlines are instead strong inlines, and
> strong inlines also get the EIGEN_ALWAYS_INLINE_ATTRIB on gcc.
>
> Most interesting are the 3v timings:
> GCC 1.72s; 779KB changes to 1.55s; 782KB.
> MSC 0.993s; 138KB changes to 0.937s; 151KB
>
> which, for this application anyhow, is an obvious improvement.
>
> With that said - not all strong inlines are useful; I initially just added
> eigen_strong_inline everywhere and that's causes excessive compile times and
> larger executables. So, on the second attempt I tried to add inlines where
> functions were otherwise cheap, particularly when a call to an
> eigen-function was implemented with another eigen function (i.e. where the
> eigen-internal call stack was more than 1 deep), or where several versions
> of an algorithm differed merely by template arguments and a few versions
> already had EIGEN_STRONG_INLINE.
>
> Also noteworthy is the relatively poor GCC performance; I'm not sure what's
> going on there. Most of my micro-benchmarks end up with GCC in a very solid
> lead, but here it's slower. I tried using gprof (which is available under
> windows), but the resultant executable immediate crashed with bad_alloc.
>
> --eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163