Re: [eigen] How do you link multiple versions (e.g. AVX vs SSE) of the same Eigen code?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

On 28.01.2015 15:51, Benoit Jacob wrote:
There may be other compile flags that break the ABI (I don't see any out of
hand, which one do you have in mind?), but this (the flags to control SIMD)

Examples are EIGEN_DEFAULT_TO_ROW_MAJOR (which also breaks the API -- and is therefore discouraged to be used, anyways), and of course EIGEN_DONT_ALIGN (which is related to our current discussion).

Regarding 32byte alignment, I like the idea of the MAX_ALIGN_*-flags.
For the default behavior, I basically see three alternatives:
1) Always make 16 byte default, warn users who compile for AVX but don't have 32byte alignment (the warning can be silenced by explicitly compiling with 16 or 32 byte alignment) 2) Always make 32 byte the default -- this wastes memory (and cache-space) for non-AVX users. Also, it breaks ABI compatibility to the current code. 3) Automatically use 32 byte alignment when AVX is enabled. Warn prominently in the documentation about ABI incompatibility.

I guess at least 95% users don't mind ABI incompatibilities, but prefer best performance for their architecture (therefore compile all TUs with the same flags anyways). So 3) would likely satisfy most users. Compiling different TUs with and without AVX appears to be non-trivial anyways and mixing SSE code with AVX code also results in bad performance due to context switches, so we can expect that users who want that, to at least read the documentation (or ask for help) if they fail. Overall, I think 1) would be the safest, at the cost of annoying many users (once) when they switch to AVX and 3) would be the "convenient for most"-solution.

is a special case because it is very common for people to want to compile
the same code with different values for these flags and choose at runtime
between these code paths, which becomes very tricky and sub-optimal if the
ABI is not the same.

Yes, that's an important use case for any "real-world" application (distributed as binary). I think we can give these users the burdon of having to read the documentation, how to achieve this (option 3). I guess most Eigen users are in R&D (I could be wrong) and are used to (re-)compile their entire source tree depending on the architecture, anyways.

Regardless, to the extent that it's true that *static* 32byte alignment is
important for performance, I'm OK to treat this as a documentation issue
and default to breaking the ABI, with sufficient warnings/documentation.

I was just wonder to what extent that was the case: static 32byte alignment
is irrelevant to 1) dynamic-size matrices, and 2) the most important cases
of fixed-size vectorizability (Vector4f, Matrix4f). But, sure, the

Matrix4f is an important question. Are there AVX instructions to make it worth aligning it to 32 bytes? If so, also for operations such as Matrix4f * Vector4f? An analogue question is if we can profit from vectorization for Matrix2f and SSE. E.g., a Matrix2f*Matrix2f product could be done with some shuffling, two pmuls and one padd (last time I checked, this product was not vectorized). Also, Matrix2f*Vector2f should be possible with some shuffling, one pmul and one hadd (and then storing only 8bytes of the result vector)


Christoph


--
----------------------------------------------
Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252
----------------------------------------------



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/