[eigen] Status of my Eigen work |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi Folks,
I just wanted to give you a bit of info about the status of my work on
Eigen to improve support for ARM NEON.
As I mentioned previously, I am implementing a computer vision algorithm
using a Array-of-Structs-of-Arrays design where the inner Arrays are
designed to map to SIMD packets.
In order to do this, I have spent the last few weeks adding features to
Eigen:
1. Fixed GCC warnings when building tests (merged)
2. Fixes to IO printing of char and unsigned char Arrays (merged).
3. Improvements to GeneralBlockPanelKernel to support packets that have
more than 4 lanes (MR !25)
4. Various tidy-ups and improvements to existing NEON packetmath. (MR !4)
5. Add support for int8_t, uint8_t, int16_t, uint16_t, and uint32_t,
packetmath for NEON.
6. Added support for pnot, pselect, pinsertfirst, and pinsertlast to
NEON packetmath, as well as pcast and preinterpret for all combinations
of packets.
7. Added a new coeff-wise binary function: array_difference, and
implemented support for ARM neon with new pabsdiff packetmath function.
8. Added a new pcombine packetmath function which can join an array of
half/quarter packets into a single packet.
9. Added support in CoreEvaluators for packet-math when type conversion
is required e.g. cast. Previously a single packet-type was used from the
asignment evaluator through to the call to ploadt.
10. Added a new pmask_cast packetmath function which is used to cast
mask-packets to different types.
11. Added support for packet math in the evaluation of Select, making
use of pcast_mask where necessary to implicitly cast the condition mask
packet type to match the value types.
12. Added support for packet access in scalar_boolean_and_op and
scalar_boolean_or_op with new packet math functions: plogical_and,
plogical_or
13. (Nearly complete) Added support for bool packetmath to NEON
The complete branch is available to view here:
https://gitlab.com/jhol/eigen/commits/neon-work/
In addition to these patches I have a hacky patch for force certain
functions inline. Hopefully I can find a way to make gcc do this
automatically soon.
There is also the GCC bug I discussed previously with the compiler
failing to eliminate NEON loads/stores to the stack:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
However, setting these issues aside, with these patches, the generated
machine code is now looking reasonably credible.
So I wanted to talk about a path forward to getting all these things
reviewed and merged upstream. I didn't want to dump this branch in a
single merge request - I think it would become a bit unmanageable, so I
think it would be better to break it up into a series of merge requests.
However, there is an issue here with the merge requests getting
backlogged. It seems like it will take a very long time to get the 2+9
outstanding merge requests merged unless there is a concerted effort to
get them through the review process.
Items 6-13 can mostly be submitted as parallel merge requests, but items
1-5 must be merged first. At the moment the two existing MRs seem to be
stalled.
I understand that it's a lot of work to review all this, and maintainers
of this project probably have other things to work on - so if there's
anything I can do to make it easier to get this stuff reviewed, please
let me know.
Best Regards
Joel Holdsworth