[eigen] Intermediate Packet Storage

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

Hi Folks,

I'm currently working on a computer vision algorithm for ARM NEON using Eigen with GCCv5, and I've got a large number of patches queued up to enable complete packet-math - some of which are already submitted.

In this algorithm, I'm using an Array-of-Structs-of-Arrays approach i.e. the image processing is done in terms of a Packet defined as follows:

static const constexpr size_t PacketSize =
template<typename T>
using Packet = Eigen::Array<T, 1, PacketSize>;

In other words, my intention is to allow my algorithm to process PacketSize pixels in a single loop using NEON SIMD registers.

But, I have a problem in a case like this:

Packet<uint32_t> a, b, c, d;

const auto E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;

The problem is that if I use auto as the type for E (i.e. delay the evaluation), this results in E being calculated twice for x and y. If the expression inside E is trivial, there is a chance the optimizer will de-duplicate the calculation, but for my use-case E is too complex.

So what about storing E into an Array/Packet?:

const Packet<uint32_t> E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;

This results in the value of E being stored onto the stack, and then reloaded twice to calculate x and y.

But of course, I just want E to stay in register.

Does anyone have any comments about how this might be possible?

I don't mind getting my hands dirty patching Eigen - I've already got a large number of patches to fix all kinds of NEON packet-math issues and add features needed for my algorithm. But I'm not sure what the correct approach would be to get this case working optimally.

I was wonder if there needs to be a new intermediate-friendly variant of Eigen::Array where the data is stored in arrays of SIMD types instead of plain-old-C arrays.

Or is there a way to de-duplicate the evaluation in the case where a lazy evaluation is going to be repeated.

Best Regards
Joel Holdsworth

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/