Re: [eigen] Matrix multiplication much slower on MSVC than on g++/clang

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Edward,

I see, that's good to know, thank you! So /fp:fast has the potential to let the compiler generate even more intrinsics. In practice I've observed the same as you wrote earlier though, adding /fp:fast in a few of my applications didn't yield any performance benefit.
The FMA speed-up is huge though :-)))

Thank you again and best wishes,

Patrik

On 8 February 2018 at 21:19, Edward Lam <edward@xxxxxxxxxx> wrote:
Hi Patrik,

On 2/8/2018 3:08 PM, Patrik Huber wrote:
>
I think this is incorrect though. I thin /fp:fast is not needed for MSVC to generate FMA code. Also, gcc and clang can generate FMA code without -ffast-math (which I guess is sort-of equivalent to /fp:fast).


Using /fp:fast is not necessary for the intrinsics, but without it, I can't get this to generate an vfmadd instruction:
=========
//foo.cpp
//
// Test with: cl /Fa /O2 /arch:AVX2 /fp:fast foo.cpp
// Generates foo.exe and foo.asm

float mul_add(float a, float b, float c) {
    return a*b + c;
}

int main()
{
    return 0;
}
=========

Best regards,
-Edward





--
Dr.. Patrik Huber
Centre for Vision, Speech and Signal Processing
University of Surrey
Guildford, Surrey GU2 7XH
United Kingdom

Web: www.patrikhuber.ch
Mobile: +44 (0)7482 633 934


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/