I played with this a little.  Indeed, for ei_assign_impl< , , LinearTraversal, NoUnrolling> I see the same slowdown.  The other ei_assign_impl's all look rather haphazard: why would ei_assign_impl< , , LinearTraversal, NoUnrolling> be strong-inlined, but ei_assign_impl< , , LinearTraversal, NoUnrolling> be merely inlined?

Is there some kind of benchmark or systematic guideline here?  I could imagine that for large matrices, strong-inline is a little overkill; but otherwise, why not just strong-inline the bunch?  Obviously, small test-cases favor more inlining; but chances are that in whatever inner loop a larger program has, performance is going to be benefitted by rather more inlining than less...

Another funny thing is that .noalias changes things (for the statement a=b-c; vs. a.noalias()=b-c;).  The performance of those statements can vary quite a bit again probably due to inconsistent inlining in msc, though not as significantly so it's also visible in gcc builds.

On Fri, Feb 26, 2010 at 20:43, Benoit Jacob
Should we then do the same in other places? I mean, this applies to
LinearVectorizedTraversal, but how about the other traversals? They
all have similar code.

I'll let the MSVC guys investigate it if they feel like it ;)


