I use eigen heavily in my dayjob - camera related math, filtering/state estimation problems. Some of our guys do the point cloud stuff too. We put these things together into embedded systems (the kinds with recent I7's backing them) and the matrix multiplications from things like kalman filters are one of the things have have to carefully manage due to the number of them and their dimensionality. One single line ends up taking like 30% of the over all program and it consists of 2 matrix mults.
So WRT us: avx/avx2 I'd be more than happy to run tests and benchmarks on this if you needed additional datapoints or didn't have a cpu with those extensions.
It'd be nice if there was support for AMD's OpenCL static C++ extensions, much like the CUDA ones Gael added a while ago - not sure how much trouble it would be to bring it there - I suspect not too bad because both OpenCL and CUDA have most of the same limitations. I may tear into it someday in the next year...