Re: [eigen] Matrix multiplication much slower on MSVC than on g++/clang

[ Thread Index | Date Index | More Archives ]

Hi Edward,

I see, that's good to know, thank you! So /fp:fast has the potential to let the compiler generate even more intrinsics. In practice I've observed the same as you wrote earlier though, adding /fp:fast in a few of my applications didn't yield any performance benefit.
The FMA speed-up is huge though :-)))

Thank you again and best wishes,


On 8 February 2018 at 21:19, Edward Lam <edward@xxxxxxxxxx> wrote:
Hi Patrik,

On 2/8/2018 3:08 PM, Patrik Huber wrote:
I think this is incorrect though. I thin /fp:fast is not needed for MSVC to generate FMA code. Also, gcc and clang can generate FMA code without -ffast-math (which I guess is sort-of equivalent to /fp:fast).

Using /fp:fast is not necessary for the intrinsics, but without it, I can't get this to generate an vfmadd instruction:
// Test with: cl /Fa /O2 /arch:AVX2 /fp:fast foo.cpp
// Generates foo.exe and foo.asm

float mul_add(float a, float b, float c) {
    return a*b + c;

int main()
    return 0;

Best regards,

Dr.. Patrik Huber
Centre for Vision, Speech and Signal Processing
University of Surrey
Guildford, Surrey GU2 7XH
United Kingdom

Mobile: +44 (0)7482 633 934

Mail converted by MHonArc 2.6.19+