Support for AVX is not completely finalized.. In particular we still have to re-enable vectorization for 128bits width vectors. So finally Vector4f will still be 16 bytes aligned and vectorized.
Then regarding alignment/ABI issues, I would keep the current behavior by default: if someone enable AVX, the best option is really to enable 32 bytes alignement, otherwise AVX gains would be strongly reduced. There are many other compiler options breaking the ABI anyway, so this only has to be clearly documented. For use cases as yours, we could offer compile-time options to choose the default maximal alignement requirement: 0, 16, 32, etc. instead of the current options that only allow you to disable alignement. Then it is up to the user to choose whether he prefers to enforce ABI compatibility by enabling 32B alignment on SSE or limit to 16B alignment for AVX.
gael