Hi.
I was testing Matrix4{d,f} multiplication performance across different Eigen versions and found that since 3..3.0, the Matrix4f multiplication speed slowed down significantly when compiled with `-march=native` flag in gcc.
The performance deteriorated on Core i5 and Core i7 but not on a Xeon.
Is this expected behavior (because for example, Eigen optimizes for larger matrices than 4x4), or am I doing something wrong like not providing the right compilation flag?
The benchmarks are in the following repo and can be reproduced by docker images:
The assembly code is also provided in the repository.
Thanks in advance for your help.
Ryo Miyajima