Yes, I have no idea about what causes this -- maybe some ARM expert can chip in.

I posted a bug to gcc bugzilla:

This godbolt link shows the difference very clearly:

When code is written using intrinsics, gcc is able to promote a stack array to SIMD registers on SSE, but not on NEON.


