Re: [eigen] array functionality...
• To: eigen@xxxxxxxxxxxxxxxxxxx
• Subject: Re: [eigen] array functionality...
• From: Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
• Date: Tue, 9 Mar 2010 20:42:21 +0100

On Tue, Mar 9, 2010 at 3:26 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> Just did that and the Eigen-fied version
>>
>> norms = (x.replicate(1,y.cols()) - y).matrix().squaredNorm()
>>
>> is way slower...
>
> How about using a colwise() here?

Which is what I actually did - it was just a typo. I also know right
now, why this is so much slower. The issue is that the final reduction
does not see that this is vectorizable so an unvectorized path is
chosen.

> (Dont remember for sure if squaredNorm is available in partial
> reductions, but if it's not then it's easy to add, or you can replace
> by this:
>
> norms = (x-y).abs2().colwise().sum()

That one was a quite good hint since now I am getting vectorization.

I attached an example of computing the column-wise squared norm of a
matrix. I tried out four possibilities.

1) manual (0.163722 secs)
2) semi-manual, loop+abs2().sum() (0.360112 secs)
3) semi-manual, loop+matrix().squaredNorm() (0.358127 secs)
4) full-automatic (1.1833 secs)

On MSVC 1) is the clear winner - probably and hopefully, in GCC 1/2
and 3 will be en par
2) and 3) perform nearly identical
4) is loosing since a non-vectorized path is chosen

I don't want to cause more work than you already have right now - so
letting this topic rest is fine with me.

There is only one thing I would like to bring up for the future. Eigen
is offering many possibilities to solve one and the same problem. In
general, it is clear that not all of them offer or even can offer the
same performance -- nonetheless I think we might consider making