I played with this a little.  Indeed, for ei_assign_impl< , , LinearTraversal, NoUnrolling> I see the same slowdown.  The other ei_assign_impl's all look rather haphazard: why would ei_assign_impl< , , LinearTraversal, NoUnrolling> be strong-inlined, but ei_assign_impl< , , LinearTraversal, NoUnrolling> be merely inlined?

Is there some kind of benchmark or systematic guideline here?  I could imagine that for large matrices, strong-inline is a little overkill; but otherwise, why not just strong-inline the bunch?  Obviously, small test-cases favor more inlining; but chances are that in whatever inner loop a larger program has, performance is going to be benefitted by rather more inlining than less...

Another funny thing is that .noalias changes things (for the statement a=b-c; vs. a.noalias()=b-c;).  The performance of those statements can vary quite a bit again probably due to inconsistent inlining in msc, though not as significantly so it's also visible in gcc builds.

On Fri, Feb 26, 2010 at 20:43, Benoit Jacob <jacob..benoit.1@xxxxxxxxx> wrote:
Should we then do the same in other places? I mean, this applies to
LinearVectorizedTraversal, but how about the other traversals? They
all have similar code.

I'll let the MSVC guys investigate it if they feel like it ;)


2010/2/26 Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>:
> Applied and thank you a lot for your help - its highly appreciated!
> - Hauke
> On Fri, Feb 26, 2010 at 6:27 PM, Eamon Nerbonne <emn13@xxxxxxxxxxxx> wrote:
>> As mentioned in the forums
>> (
>> on microsoft's compiler there's a performance regression in eigen3
>> concerning subtraction of VectorXd's (and probably other matrices with other
>> cheap operations too).  The appropriate ei_assign_impl isn't always inlined,
>> and since the operation is otherwise cheap; the function call overhead is
>> quite significant.  Gcc seems to inline the function; but MSC does not when
>> vectorization is on (EIGEN_DONT_VECTORIZE not defined).  Replacing the
>> "inline" keyword with the EIGEN_STRONG_INLINE macro resolves the problem.
>> Attached: patch.
