> Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT.  

I remember reading that one of the exact arithmetics library (GMP or MPFR) authors were running an automatic optimizer / benchmark software reshuffling instructions to produce the best results on a given micro-architecture. I suppose a hot spot JIT could get somehow close, but the hand tuned / automatically tuned code will likely be difficult to fully replace with a JIT, at least in the near future.

On Fri, Sep 18, 2020 at 8:08 AM William Tambellini wrote:
A solution :
  • do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
  • build multiple dyn libs for the ISA you care about (,,,, ... )
  • dynamic loading the right lib from the main according to the features of the current running deployed cpu: (
  • calling your api in the lib from the main to let the backends run the algo with the best optim
Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT. Example: oneDNN with asmjit :

From: Edward Lam <edward@xxxxxxxxxx>
Sent: Thursday, September 17, 2020 9:24 PM
To: eigen@xxxxxxxxxxxxxxxxxxx <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Vectorization for general use
Offhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options enabled. Then your new main() could do CPUID checks (eg. ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for your compiler options, which may present its own challenges.


On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald wrote:
I maintain an open source program that uses Eigen.  The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website.  About 80% are on Windows, 10% on Mac and 10% on Linux.  I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux.  We may eliminate the 32-bit Windows build soon.

Historically, I have compiled with no special flags enabling vectorization options for the CPU.  I would like to pursue this as I expect it will unlock some nice performance gains.  However, I'd like to keep things simple and compatible for users.

What happens when someone runs a program compiled with vectorization when their CPU does not support it?  If it fails, how graceful is the failure?

Is there a standard approach to identify the capabilities of a given machine?  I could add that to my program and survey users before making a change...  Would such code still run on a machine that was in the process of failing due to not having support for the built in vectorization?  I.e. if it is crashing, can we send a message as to why we're going down?

Is there a graceful way to support multiple options?

Any tips from other broad use applications is greatly appreciated.


