> OK, I played around with inlining on a "real" program (it's an
> implementation of learning vector quantization).
>
> These are the results:
>
> Before any updates:
> LvqBench2 on GCC: 1.93s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.88s; 774KB
> LvqBench3v on GCC: 1.72s; 779KB
> LvqBench2 on MSC: 2.02s; 124KB
> LvqBench2v on MSC: 1.24s; 129KB
> LvqBench3 on MSC: 1.64s; 131KB
> LvqBench3v on MSC: 0.993s; 138KB
>
> Post-patch; EIGEN_MORE_INLINE off
> LvqBench2 on GCC: 1.93s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.89s; 777KB
> LvqBench3v on GCC: 1.7s; 778KB
> LvqBench2 on MSC: 2.02s; 124KB
> LvqBench2v on MSC: 1.24s; 129KB
> LvqBench3 on MSC: 1.63s; 131KB
> LvqBench3v on MSC: 0.988s; 141KB
>
> Post-patch; EIGEN_MORE_INLINE on
> LvqBench2 on GCC: 1.92s; 766KB
> LvqBench2v on GCC: 1.69s; 770KB
> LvqBench3 on GCC: 1.72s; 777KB
> LvqBench3v on GCC: 1.55s; 782KB
> LvqBench2 on MSC: 2.01s; 124KB
> LvqBench2v on MSC: 1.25s; 129KB
> LvqBench3 on MSC: 1.16s; 138KB
> LvqBench3v on MSC: 0.937s; 151KB
>
> The 2/2v/3/3v suffix corresponds to the version of eigen and whether
> vectorization is on. The timings are best of 10 runs. Although the eigen2
> variants weren't changed, I left in their timings to give a feel for the
> variance of the timings.
>
> The update removed a few EIGEN_DONT_INLINE's and added a few inlines. When
> EIGEN_MORE_INLINE is on, those extra inlines are instead strong inlines, and
> strong inlines also get the EIGEN_ALWAYS_INLINE_ATTRIB on gcc.
>
> Most interesting are the 3v timings:
> GCC 1.72s; 779KB changes to 1.55s; 782KB.
> MSC 0.993s; 138KB changes to 0.937s; 151KB
>
> which, for this application anyhow, is an obvious improvement.
>
> With that said - not all strong inlines are useful; I initially just added
> eigen_strong_inline everywhere and that's causes excessive compile times and
> larger executables. So, on the second attempt I tried to add inlines where
> functions were otherwise cheap, particularly when a call to an
> eigen-function was implemented with another eigen function (i.e. where the
> eigen-internal call stack was more than 1 deep), or where several versions
> of an algorithm differed merely by template arguments and a few versions
> already had EIGEN_STRONG_INLINE.
>
> Also noteworthy is the relatively poor GCC performance; I'm not sure what's
> going on there. Most of my micro-benchmarks end up with GCC in a very solid
> lead, but here it's slower. I tried using gprof (which is available under
> windows), but the resultant executable immediate crashed with bad_alloc.
>
> --
eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163