Another option to consider is to use a Compiler that supports function multi-versioning based on ISA. For e.g., Intel Compiler has -ax flag –
GCC seems to support this through attribute directives -
From: Rob McDonald <rob.a.mcdonald@xxxxxxxxx>
Sent: Friday, September 18, 2020 10:19 AM
Subject: Re: [eigen] Vectorization for general use
Thanks for everyone's responses and links. Very helpful.
This seems like it is quite a thorny issue... It really makes using these advanced features fairly challenging.
I'm not sure that it is practical for me to separate out a shared library to be selectively loaded (vs. just separate executables). Although the algorithms may be somewhat contained, the data structures can have quite wide reach. It isn't
obvious how to separate what needs to be compiled with these flags and what does not (particularly since we didn't design for this from the start). This is also a case where Eigen being a header-only library is a bit of a drawback. If it was a traditional
compiled library, it would likely be easier to draw the line at eigen_sse.so, eigen_avx1.so or whatever.
My project builds with CMake, which isn't very friendly at using different toolchains for different parts of the project -- or compiling the same part multiple times. It is possible, but not particularly pretty.
do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
build multiple dyn libs for the ISA you care about (sse.so, avx1.so, avx2.so, avx512.so, ... )
dynamic loading the right lib from the main according to the features of the current running deployed cpu: (https://github.com/google/cpu_features)
calling your api in the lib from the main to let the backends run the algo with the best optim
Offhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options
enabled. Then your new main() could do CPUID checks (eg. https://stackoverflow.com/a/4823889 ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for
your compiler options, which may present its own challenges.
I maintain an open source program that uses Eigen. The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website. About 80% are on Windows, 10% on Mac and 10% on Linux. I only provide
X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux. We may eliminate the 32-bit Windows build soon.
Historically, I have compiled with no special flags enabling vectorization options for the CPU. I would like to pursue this as I expect it will unlock some nice performance gains. However, I'd like to keep things simple and compatible
What happens when someone runs a program compiled with vectorization when their CPU does not support it? If it fails, how graceful is the failure?
Is there a standard approach to identify the capabilities of a given machine? I could add that to my program and survey users before making a change... Would such code still run on a machine that was in the process of failing due to not
having support for the built in vectorization? I.e. if it is crashing, can we send a message as to why we're going down?
Is there a graceful way to support multiple options?
Any tips from other broad use applications is greatly appreciated.
here to report this email as spam.