|[eigen] stability of internal "packet math" interfaces|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: [eigen] stability of internal "packet math" interfaces
- From: Chris Dyer <cdyer@xxxxxxxxxx>
- Date: Wed, 13 Jan 2016 21:28:03 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=0ejvhhsuELP6F81jJ2cVzluE3ieDDewCV5XoyhVmyBI=; b=sj5HzqHu2E40foBHG2PGUhN49iotRLtIaGhV2UGpNqgdxGeB6KhLcyfrDYGBvpWhPU yeHlCenZB4zTXK0yUdOlaS4UyNIe8Gm+HkujdDqU6FYMmR2Q7G2F39baBzMuaj8TuLWK oseSGFbFXLooHNmutJOWGpq0GZn9ImBIT0O49CLwXTNNJWf01Tg671FcfEHZIiKX+eyK N8WTjo9x5f/cAlRlsRmgozt9IzJrQC86+U6oRphRJsnmo1XDGc9puPsmIqlRdIFKIMmv vRPDQ55oRmv9MJTYUg4M55X4DhoCIbWkpBwDB4n+3KudRXlMIe0pMhtzRlA1uBN0Y48m WbSA==
I'm curious about the stability of Eigen's internal "packet math" interfaces. For some custom functors that we use in a neural network library, we've found it helpful to provide SIMD implementations to get a bit more performance on the CPU (see eg https://github.com/clab/cnn/blob/master/cnn/simd-functors.h#L172
). However, putting lots of "using namespace Eigen::internal" in the implementations is a little disconcerting. Are these likely to remain stable? Might it be possible to make packet math a part of the public interface?
A second related question concerns knowing the Packet type in the functor's constructor. For some of the more complex functor implementations, we need to create several constant "pset" values that could be created in the functor's constructor and reused in each packetOp. Looking through Eigen's implementations, this is done inside packetOp (eg https://bitbucket.org/eigen/eigen/src/98fcfcb99dee04c9ffc8014cf8d56050692fe231/Eigen/src/Core/functors/UnaryFunctors.h?at=default&fileviewer=file-view-default#UnaryFunctors.h-564
). For simple calls like pset<Packet>(1), the compiler seems good at pulling out the relevant constant bits. Unfortunately, with our more complex operations, we're not always getting the best optimization from the compiler, and we end up with slower SIMD implementations than regular scalar implementations. I can't see an obvious easy work around with the current structure of the library, but I was wondering if this had come up before.