Re: [eigen] stability of internal "packet math" interfaces

There's been a fair amount of work put in Eigen recently to add support for the functions most commonly used in neural networks. For example, tanh and erf are now part of the public API. I would encourage you to use these directly in your implementation, and contribute back performance improvements that you may have.

We've been using the internal packet apis for about 2 years now, and they've been very stable. The only change that I can remember is that they've been extended with new functions.

We often use the EIGEN_DECLARE_CONST_Packet4f macros to instantiate constant packets. That seems to work well. Maybe you could try to use them and see if that helps?

Benoit

https://bitbucket.org/eigen/eigen/src/98fcfcb99dee04c9ffc8014cf8d56050692fe231/Eigen/src/Core/arch/SSE/MathFunctions.h?at=default&fileviewer=file-view-default#MathFunctions.h-26

On Wed, Jan 13, 2016 at 6:28 PM, Chris Dyer <cdyer@xxxxxxxxxx> wrote:

Hi all,
I'm curious about the stability of Eigen's internal "packet math" interfaces. For some custom functors that we use in a neural network library, we've found it helpful to provide SIMD implementations to get a bit more performance on the CPU (see eg https://github.com/clab/cnn/blob/master/cnn/simd-functors.h#L172). However, putting lots of "using namespace Eigen::internal" in the implementations is a little disconcerting. Are these likely to remain stable? Might it be possible to make packet math a part of the public interface?

A second related question concerns knowing the Packet type in the functor's constructor. For some of the more complex functor implementations, we need to create several constant "pset" values that could be created in the functor's constructor and reused in each packetOp. Looking through Eigen's implementations, this is done inside packetOp (eg https://bitbucket.org/eigen/eigen/src/98fcfcb99dee04c9ffc8014cf8d56050692fe231/Eigen/src/Core/functors/UnaryFunctors.h?at=default&fileviewer=file-view-default#UnaryFunctors.h-564). For simple calls like pset<Packet>(1), the compiler seems good at pulling out the relevant constant bits. Unfortunately, with our more complex operations, we're not always getting the best optimization from the compiler, and we end up with slower SIMD implementations than regular scalar implementations. I can't see an obvious easy work around with the current structure of the library, but I was wondering if this had come up before.

Thanks!
Chris