RE: [eigen] Vectorization for general use

Another option to consider is to use a Compiler that supports function multi-versioning based on ISA. For e.g., Intel Compiler has -ax flag – https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/code-generation-options/ax-qax.html#ax-qax

GCC seems to support this through attribute directives - https://lwn.net/Articles/691932/

-Vamsi

From: Rob McDonald <rob.a.mcdonald@xxxxxxxxx>
Sent: Friday, September 18, 2020 10:19 AM
To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Vectorization for general use

Thanks for everyone's responses and links. Very helpful.

This seems like it is quite a thorny issue... It really makes using these advanced features fairly challenging.

I'm not sure that it is practical for me to separate out a shared library to be selectively loaded (vs. just separate executables). Although the algorithms may be somewhat contained, the data structures can have quite wide reach. It isn't obvious how to separate what needs to be compiled with these flags and what does not (particularly since we didn't design for this from the start). This is also a case where Eigen being a header-only library is a bit of a drawback. If it was a traditional compiled library, it would likely be easier to draw the line at eigen_sse.so, eigen_avx1.so or whatever.

My project builds with CMake, which isn't very friendly at using different toolchains for different parts of the project -- or compiling the same part multiple times. It is possible, but not particularly pretty.

Rob

On Thu, Sep 17, 2020 at 11:09 PM William Tambellini <wtambellini@xxxxxxx> wrote:

A solution :

do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
build multiple dyn libs for the ISA you care about (sse.so, avx1.so, avx2.so, avx512.so, ... )
dynamic loading the right lib from the main according to the features of the current running deployed cpu: (https://github.com/google/cpu_features)
calling your api in the lib from the main to let the backends run the algo with the best optim

Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT. Example: oneDNN with asmjit : https://github.com/asmjit/asmjit

Kind

W.

Share your
feedback with us

From: Edward Lam <edward@xxxxxxxxxx>
Sent: Thursday, September 17, 2020 9:24 PM
To: eigen@xxxxxxxxxxxxxxxxxxx <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Vectorization for general use

Offhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options enabled. Then your new main() could do CPUID checks (eg. https://stackoverflow.com/a/4823889 ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for your compiler options, which may present its own challenges.

Cheers,

-Edward

On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald <rob.a.mcdonald@xxxxxxxxx> wrote:

I maintain an open source program that uses Eigen. The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website. About 80% are on Windows, 10% on Mac and 10% on Linux. I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux. We may eliminate the 32-bit Windows build soon.

Historically, I have compiled with no special flags enabling vectorization options for the CPU. I would like to pursue this as I expect it will unlock some nice performance gains. However, I'd like to keep things simple and compatible for users.

What happens when someone runs a program compiled with vectorization when their CPU does not support it? If it fails, how graceful is the failure?

Is there a standard approach to identify the capabilities of a given machine? I could add that to my program and survey users before making a change... Would such code still run on a machine that was in the process of failing due to not having support for the built in vectorization? I.e. if it is crashing, can we send a message as to why we're going down?

Is there a graceful way to support multiple options?

Any tips from other broad use applications is greatly appreciated.

Rob

Click here to report this email as spam.