2015-01-28 10:45 GMT-05:00 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:

Matrix4f is an important question. Are there AVX instructions to make it worth aligning it to 32 bytes? If so, also for operations such as Matrix4f * Vector4f?
An analogue question is if we can profit from vectorization for Matrix2f and SSE. E.g., a Matrix2f*Matrix2f product could be done with some shuffling, two pmuls and one padd (last time I checked, this product was not vectorized). Also, Matrix2f*Vector2f should be possible with some shuffling, one pmul and one hadd (and then storing only 8bytes of the result vector)

You're right, I was missing how AVX could be useful for smaller float cases, but that could be very interesting indeed.


