|Re: [eigen] stability of internal "packet math" interfaces|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
generally we don't guarantee any kind of API stability for stuff in the
internal namespace -- but of course we try to make as little changes as
possible to widely used stuff (as the packet math).
In the long term, we may re-factor it to provide meta-packets
(http://eigen.tuxfamily.org/bz/show_bug.cgi?id=692) -- but that is
unlikely to happen before 3.4.
Other than that, as Benoit already suggested: Try using the public API
where possible, e.g., if T is an Eigen::Array, you can directly write
exp(T - logz) * d; or (T-logz).exp()*d;
and you should get essentially the same as with your
scalar_nlsoftmax_backward_op (I guess for Tensors this works as well).
If there is functionality missing which you assume to be useful in
general, file a bug or make a pull-request.
Btw: I'd advice against writing `using namespace Eigen::internal` --
you'll likely pollute your scope with lots of unwanted functions.
Instead you can abbreviate/alias the namespace with
namespace EI = Eigen::internal;
and then use EI::pset<....>, etc
On 2016-01-14 03:28, Chris Dyer wrote:
I'm curious about the stability of Eigen's internal "packet math"
interfaces. For some custom functors that we use in a neural network
library, we've found it helpful to provide SIMD implementations to get a
bit more performance on the CPU (see eg
putting lots of "using namespace Eigen::internal" in the implementations is
a little disconcerting. Are these likely to remain stable? Might it be
possible to make packet math a part of the public interface?
A second related question concerns knowing the Packet type in the functor's
constructor. For some of the more complex functor implementations, we need
to create several constant "pset" values that could be created in the
functor's constructor and reused in each packetOp. Looking through Eigen's
implementations, this is done inside packetOp (eg
For simple calls like pset<Packet>(1), the compiler seems good at pulling
out the relevant constant bits. Unfortunately, with our more complex
operations, we're not always getting the best optimization from the
compiler, and we end up with slower SIMD implementations than regular
scalar implementations. I can't see an obvious easy work around with the
current structure of the library, but I was wondering if this had come up
Dipl. Inf., Dipl. Math. Christoph Hertzberg
FB 3 - Mathematik und Informatik
28359 Bremen, Germany
Zentrale: +49 421 178 45-6611
Besuchsadresse der Nebengeschäftsstelle:
28359 Bremen, Germany
Tel.: +49 421 178 45-4021
Empfang: +49 421 178 45-6600
Fax: +49 421 178 45-4150
Weitere Informationen: http://www.informatik.uni-bremen.de/robotik