[eigen] Intermediate Packet Storage |

[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]

Hi Folks,

`I'm currently working on a computer vision algorithm for ARM NEON using
``Eigen with GCCv5, and I've got a large number of patches queued up to
``enable complete packet-math - some of which are already submitted.
`

`In this algorithm, I'm using an Array-of-Structs-of-Arrays approach i.e.
``the image processing is done in terms of a Packet defined as follows:
`
static const constexpr size_t PacketSize =
Eigen::internal::packet_traits<int32_t>::size;
template<typename T>
using Packet = Eigen::Array<T, 1, PacketSize>;

`In other words, my intention is to allow my algorithm to process
``PacketSize pixels in a single loop using NEON SIMD registers.
`
But, I have a problem in a case like this:
Packet<uint32_t> a, b, c, d;
a.setRandom();
b.setRandom();
c.setRandom();
d.setRandom();
const auto E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;

`The problem is that if I use auto as the type for E (i.e. delay the
``evaluation), this results in E being calculated twice for x and y. If
``the expression inside E is trivial, there is a chance the optimizer will
``de-duplicate the calculation, but for my use-case E is too complex.
`
So what about storing E into an Array/Packet?:
const Packet<uint32_t> E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;

`This results in the value of E being stored onto the stack, and then
``reloaded twice to calculate x and y.
`
But of course, I just want E to stay in register.
Does anyone have any comments about how this might be possible?

`I don't mind getting my hands dirty patching Eigen - I've already got a
``large number of patches to fix all kinds of NEON packet-math issues and
``add features needed for my algorithm. But I'm not sure what the correct
``approach would be to get this case working optimally.
`

`I was wonder if there needs to be a new intermediate-friendly variant of
``Eigen::Array where the data is stored in arrays of SIMD types instead of
``plain-old-C arrays.
`

`Or is there a way to de-duplicate the evaluation in the case where a
``lazy evaluation is going to be repeated.
`
Best Regards
Joel Holdsworth