To make things more complicated we could also think about how to
support packet of different sizes for the same scalar type. Indeed,
both the NEON (for ARM) and the future AVX engines support packet of
different sizes. 2 or 4 floats for NEON, and 4 or 8 floats for AVX.
For dynamic sized objects it is pretty clear that supporting largest
packets is enough. However, for small fixed sized objects it be very
welcome to be able to instantiate packet types according to the
context. For instance, if AVX is enabled, we still want to able to use
packets of 4 floats to vectorize Vector4 and Matrix4. This is very
important for what you know...