Christoph, regarding
bug 721 is looks like the compiler version has a lot more impact on the actual performance of the code than the actual instruction set (at least for x86 cpus). Maybe in addition to focus on recent hardware we should also focus our efforts on recent versions of the compilers to reduce the amount of validation work needed to vet improvements?
Jason: thanks for offering to test the performance of AVX. There is a
bitbucket branch that is almost ready with support for both AVX and FMA. It "works for me", but additional data points would be useful to confirm that it does speedup Eigen across a large set of applications. Please note that at the moment the code requires gcc 4.8 to compile.