Re: [eigen] stability of internal "packet math" interfaces

[ Thread Index | Date Index | More Archives ]


generally we don't guarantee any kind of API stability for stuff in the internal namespace -- but of course we try to make as little changes as possible to widely used stuff (as the packet math). In the long term, we may re-factor it to provide meta-packets ( -- but that is unlikely to happen before 3.4. Other than that, as Benoit already suggested: Try using the public API where possible, e.g., if T is an Eigen::Array, you can directly write
  exp(T - logz) * d;  or  (T-logz).exp()*d;
and you should get essentially the same as with your scalar_nlsoftmax_backward_op (I guess for Tensors this works as well). If there is functionality missing which you assume to be useful in general, file a bug or make a pull-request.

Btw: I'd advice against writing `using namespace Eigen::internal` -- you'll likely pollute your scope with lots of unwanted functions. Instead you can abbreviate/alias the namespace with
  namespace EI = Eigen::internal;
and then use EI::pset<....>, etc


On 2016-01-14 03:28, Chris Dyer wrote:
Hi all,
I'm curious about the stability of Eigen's internal "packet math"
interfaces. For some custom functors that we use in a neural network
library, we've found it helpful to provide SIMD implementations to get a
bit more performance on the CPU (see eg However,
putting lots of "using namespace Eigen::internal" in the implementations is
a little disconcerting. Are these likely to remain stable? Might it be
possible to make packet math a part of the public interface?

A second related question concerns knowing the Packet type in the functor's
constructor. For some of the more complex functor implementations, we need
to create several constant "pset" values that could be created in the
functor's constructor and reused in each packetOp. Looking through Eigen's
implementations, this is done inside packetOp (eg
For simple calls like pset<Packet>(1), the compiler seems good at pulling
out the relevant constant bits. Unfortunately, with our more complex
operations, we're not always getting the best optimization from the
compiler, and we end up with slower SIMD implementations than regular
scalar implementations. I can't see an obvious easy work around with the
current structure of the library, but I was wondering if this had come up


 Dipl. Inf., Dipl. Math. Christoph Hertzberg

 Universität Bremen
 FB 3 - Mathematik und Informatik
 AG Robotik
 Robert-Hooke-Straße 1
 28359 Bremen, Germany

 Zentrale: +49 421 178 45-6611

 Besuchsadresse der Nebengeschäftsstelle:
 Robert-Hooke-Straße 5
 28359 Bremen, Germany

 Tel.:    +49 421 178 45-4021
 Empfang: +49 421 178 45-6600
 Fax:     +49 421 178 45-4150
 E-Mail:  chtz@xxxxxxxxxxxxxxxxxxxxxxxx

 Weitere Informationen:

Mail converted by MHonArc 2.6.19+