Thank you for looking into this.
Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen comparable to CUSTOM_GEMM.
However, I went ahead and replaced all use of small block operations in the eliminator with simple gemm and gemv implementations. And the time has dropped even further. Which would not be the case if inlining were the only thing at work here.
With the increased inlining 1.02s
With custom blas 0.634s
I get roughy similar numbers with g++4.2 on macos. I also tested this on linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5 seconds.
Sameer