Re: [eigen] Vectorization for general use

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


> Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT.  

I remember reading that one of the exact arithmetics library (GMP or MPFR) authors were running an automatic optimizer / benchmark software reshuffling instructions to produce the best results on a given micro-architecture. I suppose a hot spot JIT could get somehow close, but the hand tuned / automatically tuned code will likely be difficult to fully replace with a JIT, at least in the near future.


On Fri, Sep 18, 2020 at 8:08 AM William Tambellini <wtambellini@xxxxxxx> wrote:
A solution :
  • do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
  • build multiple dyn libs for the ISA you care about (sse.so, avx1.so, avx2.so, avx512.so, ... )
  • dynamic loading the right lib from the main according to the features of the current running deployed cpu: (https://github.com/google/cpu_features)
  • calling your api in the lib from the main to let the backends run the algo with the best optim
Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT. Example: oneDNN with asmjit : https://github.com/asmjit/asmjit
Kind
W.

Share your
feedback with us

 
From: Edward Lam <edward@xxxxxxxxxx>
Sent: Thursday, September 17, 2020 9:24 PM
To: eigen@xxxxxxxxxxxxxxxxxxx <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Vectorization for general use
 
Offhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options enabled. Then your new main() could do CPUID checks (eg. https://stackoverflow.com/a/4823889 ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for your compiler options, which may present its own challenges.

Cheers,
-Edward

On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald <rob.a.mcdonald@xxxxxxxxxx> wrote:
I maintain an open source program that uses Eigen.  The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website.  About 80% are on Windows, 10% on Mac and 10% on Linux.  I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux.  We may eliminate the 32-bit Windows build soon.

Historically, I have compiled with no special flags enabling vectorization options for the CPU.  I would like to pursue this as I expect it will unlock some nice performance gains.  However, I'd like to keep things simple and compatible for users.

What happens when someone runs a program compiled with vectorization when their CPU does not support it?  If it fails, how graceful is the failure?

Is there a standard approach to identify the capabilities of a given machine?  I could add that to my program and survey users before making a change...  Would such code still run on a machine that was in the process of failing due to not having support for the built in vectorization?  I.e. if it is crashing, can we send a message as to why we're going down?

Is there a graceful way to support multiple options?

Any tips from other broad use applications is greatly appreciated.

Rob




Click here to report this email as spam.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/