I also forgot to mention...

14. New shift_left<N> and shift_right<N> coeff-wise unary operators

On 12/19/19 1:42 PM, Joel Holdsworth wrote:
Hi Folks,

I just wanted to give you a bit of info about the status of my work on Eigen to improve support for ARM NEON.

As I mentioned previously, I am implementing a computer vision algorithm using a Array-of-Structs-of-Arrays design where the inner Arrays are designed to map to SIMD packets.

In order to do this, I have spent the last few weeks adding features to Eigen:

1. Fixed GCC warnings when building tests (merged)
2. Fixes to IO printing of char and unsigned char Arrays (merged).
3. Improvements to GeneralBlockPanelKernel to support packets that have more than 4 lanes (MR !25)
4. Various tidy-ups and improvements to existing NEON packetmath. (MR !4)
5. Add support for int8_t, uint8_t, int16_t, uint16_t, and uint32_t, packetmath for NEON. 6. Added support for pnot, pselect, pinsertfirst, and pinsertlast to NEON packetmath, as well as pcast and preinterpret for all combinations of packets. 7. Added a new coeff-wise binary function: array_difference, and implemented support for ARM neon with new pabsdiff packetmath function. 8. Added a new pcombine packetmath function which can join an array of half/quarter packets into a single packet. 9. Added support in CoreEvaluators for packet-math when type conversion is required e.g. cast. Previously a single packet-type was used from the asignment evaluator through to the call to ploadt. 10. Added a new pmask_cast packetmath function which is used to cast mask-packets to different types. 11. Added support for packet math in the evaluation of Select, making use of pcast_mask where necessary to implicitly cast the condition mask packet type to match the value types. 12. Added support for packet access in scalar_boolean_and_op and scalar_boolean_or_op with new packet math functions: plogical_and, plogical_or
13. (Nearly complete) Added support for bool packetmath to NEON

The complete branch is available to view here:

In addition to these patches I have a hacky patch for force certain functions inline. Hopefully I can find a way to make gcc do this automatically soon.

There is also the GCC bug I discussed previously with the compiler failing to eliminate NEON loads/stores to the stack:

However, setting these issues aside, with these patches, the generated machine code is now looking reasonably credible.

So I wanted to talk about a path forward to getting all these things reviewed and merged upstream. I didn't want to dump this branch in a single merge request - I think it would become a bit unmanageable, so I think it would be better to break it up into a series of merge requests.

However, there is an issue here with the merge requests getting backlogged. It seems like it will take a very long time to get the 2+9 outstanding merge requests merged unless there is a concerted effort to get them through the review process.

Items 6-13 can mostly be submitted as parallel merge requests, but items 1-5 must be merged first. At the moment the two existing MRs seem to be stalled.

I understand that it's a lot of work to review all this, and maintainers of this project probably have other things to work on - so if there's anything I can do to make it easier to get this stuff reviewed, please let me know.

Best Regards
Joel Holdsworth

