Re: [eigen] Vectorization for general use

[ Thread Index | Date Index | More Archives ]

It might be worth adding that AVX has been around for a long time - it has been supported by Intel and AMD CPUs since 2011 (see AVX2 came to CPU generations in 2013. You might want to check or guess how many people really run an older CPU. If it is a fairly compute-heavy application, chances are that users won't have much fun with it anyway on older CPUs.

That being said, in particular the Sandy Bridge (e.g.. i5-25X0K, i7-2700K) and Ivy Bridge (e.g. i5-3550) were extremely popular CPUs and are probably still widely used. I myself have a i5-3550 and it runs everything perfectly, so I don't have a real reason to upgrade even that 7 year old CPU. So I would not go as far as assuming that the majority of your user's CPUs would support AVX2 - but it might be true for AVX.

One useful data point: Check the Steam Hardware Survey, scroll down to "Other Settings".. According to that, as of Aug 2020, 93% of Steam users have CPUs supporting AVX, and 77% AVX2. This is likely biased towards gaming computers out there, but should be fairly representative still and I doubt you'll find better data.

Best wishes,

On Sat, 19 Sep 2020 at 22:30, Sripathi, Vamsi <vamsi.sripathi@xxxxxxxxx> wrote:

Another option to consider is to use a Compiler that supports function multi-versioning based on ISA. For e.g., Intel Compiler has -ax flag –


GCC seems to support this through attribute directives -




From: Rob McDonald <rob.a.mcdonald@xxxxxxxxx>
Sent: Friday, September 18, 2020 10:19 AM
To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Vectorization for general use


Thanks for everyone's responses and links.  Very helpful.


This seems like it is quite a thorny issue...  It really makes using these advanced features fairly challenging.


I'm not sure that it is practical for me to separate out a shared library to be selectively loaded (vs. just separate executables).  Although the algorithms may be somewhat contained, the data structures can have quite wide reach.  It isn't obvious how to separate what needs to be compiled with these flags and what does not (particularly since we didn't design for this from the start).  This is also a case where Eigen being a header-only library is a bit of a drawback.  If it was a traditional compiled library, it would likely be easier to draw the line at, or whatever.


My project builds with CMake, which isn't very friendly at using different toolchains for different parts of the project -- or compiling the same part multiple times.  It is possible, but not particularly pretty.





On Thu, Sep 17, 2020 at 11:09 PM William Tambellini <wtambellini@xxxxxxx> wrote:

A solution :

  • do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
  • build multiple dyn libs for the ISA you care about (,,,, ... )
  • dynamic loading the right lib from the main according to the features of the current running deployed cpu: (
  • calling your api in the lib from the main to let the backends run the algo with the best optim

Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT. Example: oneDNN with asmjit :





From: Edward Lam <edward@xxxxxxxxxx>
Sent: Thursday, September 17, 2020 9:24 PM
To: eigen@xxxxxxxxxxxxxxxxxxx <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Vectorization for general use


Offhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options enabled. Then your new main() could do CPUID checks (eg. ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for your compiler options, which may present its own challenges.





On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald <rob.a.mcdonald@xxxxxxxxx> wrote:

I maintain an open source program that uses Eigen.  The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website.  About 80% are on Windows, 10% on Mac and 10% on Linux.  I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux.  We may eliminate the 32-bit Windows build soon.


Historically, I have compiled with no special flags enabling vectorization options for the CPU.  I would like to pursue this as I expect it will unlock some nice performance gains.  However, I'd like to keep things simple and compatible for users.


What happens when someone runs a program compiled with vectorization when their CPU does not support it?  If it fails, how graceful is the failure?


Is there a standard approach to identify the capabilities of a given machine?  I could add that to my program and survey users before making a change...  Would such code still run on a machine that was in the process of failing due to not having support for the built in vectorization?  I.e. if it is crashing, can we send a message as to why we're going down?


Is there a graceful way to support multiple options?


Any tips from other broad use applications is greatly appreciated.






Click here to report this email as spam.

Mail converted by MHonArc 2.6.19+