I'm happy to report that I finally have a working arm64 implementation of PacketMath that passes all of the packetmath unit tests.. I've tested it with the LLVM compiler in Xcode 5, running on the Apple A7 processor in an iPhone 5s. This is the only arm64 toolchain I have access to.
The biggest new feature of the NEON instruction for arm64 is double precision, but there are a few other nice additions:
- A vector DIV instruction (instead of the reciprocal estimate / step sequence)
- A vector SQRT instruction (32-bit NEON only has instructions for estimating the reciprocal square root)
- Min and max vector element instructions (instead of pairwise reduction)
What's the preferred way to share this code? The changes are contained in a new PacketMath64.h header, plus a two line addition to the Core header to include PacketMath64.h when __aarch64__ is defined.
--Chris