Re: [eigen] Matrix multiplication much slower on MSVC than on g++/clang |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Matrix multiplication much slower on MSVC than on g++/clang
- From: Edward Lam <edward@xxxxxxxxxx>
- Date: Thu, 8 Feb 2018 16:19:10 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sidefx.com; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=2KZzaoCx4k9q102Obo3Ju2B80I99kSLaR7cknwYSaMM=; b=cRYLLgMYXDcjO01RJpE7zL+GCSgjnxvfeoc2xZcSNS9Xn9O2guOCTBx4qHRTCNtatW VaErpA7wSxPtSltmTEh2JvJwDtupKHlKZu2+pMkvdkzk0FBNVxfhQdNrDFdki39bZrKg ZJmGN/Aof76Rkc3GrrQJsFyY1JtO6Y9x9hHQgwx5D8Z0Znpmx+kFEQQkdFVTCnI+N8J3 EfZ7NtMI/GSTCZpFRpExnqpcABYCB4Xb9K5GGPNHuZ7VgpgJdochH1GHtbd0AN8Ta2O6 bzxuRabEjwLk5j5yNqp3sVaA6FP4NOutCobOJxqwjpoPInPvG5NTqnSTLoW2g4vf6Ky1 VqCw==
Hi Patrik,
On 2/8/2018 3:08 PM, Patrik Huber wrote:
>
I think this is incorrect though. I thin /fp:fast is not needed for MSVC to
generate FMA code. Also, gcc and clang can generate FMA code without -ffast-math
(which I guess is sort-of equivalent to /fp:fast).
Using /fp:fast is not necessary for the intrinsics, but without it, I can't get
this to generate an vfmadd instruction:
=========
//foo.cpp
//
// Test with: cl /Fa /O2 /arch:AVX2 /fp:fast foo.cpp
// Generates foo.exe and foo.asm
float mul_add(float a, float b, float c) {
return a*b + c;
}
int main()
{
return 0;
}
=========
Best regards,
-Edward