This is mainly a question to Gael, but posting to the list nevertheless.

What exactly is ei_palign_impl() supposed to do (it could use a better 
description in GenericPacketMath.h btw)? I only have this one to finish for 
the ARM NEON port! :D

Results so far show a consistent 2-5x speed increase -I simply run time <test> 
using the programs found under test/. I will run a proper benchmark when it's 
finished. I will run this on a couple of Cortex-A8 systems I have here with 
NEON as well as a remote access to a prototype quad-core Cortex-A9 ;-) (I 
wonder if OpenMP runs there as well, lol)



