Re: [eigen] Vectorization for general use |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
A solution :
- do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
- build multiple dyn libs for the ISA you care about (sse.so, avx1.so, avx2.so, avx512.so, ... )
- dynamic loading the right lib from the main according to the features of the current running deployed cpu: (https://github.com/google/cpu_features)
- calling your api in the lib from the main to let the backends run the algo with the best optim
Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT. Example: oneDNN with asmjit : https://github.com/asmjit/asmjit
KindW.
From: Edward Lam <edward@xxxxxxxxxx>
Sent: Thursday, September 17, 2020 9:24 PM
To: eigen@xxxxxxxxxxxxxxxxxxx <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Vectorization for general useOffhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options enabled. Then your new main() could do CPUID checks (eg. https://stackoverflow.com/a/4823889 ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for your compiler options, which may present its own challenges.
Cheers,-Edward
On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald <rob.a.mcdonald@xxxxxxxxxx> wrote:
I maintain an open source program that uses Eigen. The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website. About 80% are on Windows, 10% on Mac and 10% on Linux. I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux. We may eliminate the 32-bit Windows build soon.Historically, I have compiled with no special flags enabling vectorization options for the CPU. I would like to pursue this as I expect it will unlock some nice performance gains. However, I'd like to keep things simple and compatible for users.
What happens when someone runs a program compiled with vectorization when their CPU does not support it? If it fails, how graceful is the failure?
Is there a standard approach to identify the capabilities of a given machine? I could add that to my program and survey users before making a change... Would such code still run on a machine that was in the process of failing due to not having support for the built in vectorization? I.e. if it is crashing, can we send a message as to why we're going down?
Is there a graceful way to support multiple options?
Any tips from other broad use applications is greatly appreciated.
Rob
Click here to report this email as spam.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |