[eigen] Intermediate Packet Storage |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi Folks,
I'm currently working on a computer vision algorithm for ARM NEON using
Eigen with GCCv5, and I've got a large number of patches queued up to
enable complete packet-math - some of which are already submitted.
In this algorithm, I'm using an Array-of-Structs-of-Arrays approach i.e.
the image processing is done in terms of a Packet defined as follows:
static const constexpr size_t PacketSize =
Eigen::internal::packet_traits<int32_t>::size;
template<typename T>
using Packet = Eigen::Array<T, 1, PacketSize>;
In other words, my intention is to allow my algorithm to process
PacketSize pixels in a single loop using NEON SIMD registers.
But, I have a problem in a case like this:
Packet<uint32_t> a, b, c, d;
a.setRandom();
b.setRandom();
c.setRandom();
d.setRandom();
const auto E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;
The problem is that if I use auto as the type for E (i.e. delay the
evaluation), this results in E being calculated twice for x and y. If
the expression inside E is trivial, there is a chance the optimizer will
de-duplicate the calculation, but for my use-case E is too complex.
So what about storing E into an Array/Packet?:
const Packet<uint32_t> E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;
This results in the value of E being stored onto the stack, and then
reloaded twice to calculate x and y.
But of course, I just want E to stay in register.
Does anyone have any comments about how this might be possible?
I don't mind getting my hands dirty patching Eigen - I've already got a
large number of patches to fix all kinds of NEON packet-math issues and
add features needed for my algorithm. But I'm not sure what the correct
approach would be to get this case working optimally.
I was wonder if there needs to be a new intermediate-friendly variant of
Eigen::Array where the data is stored in arrays of SIMD types instead of
plain-old-C arrays.
Or is there a way to de-duplicate the evaluation in the case where a
lazy evaluation is going to be repeated.
Best Regards
Joel Holdsworth