Re: [eigen] How do you link multiple versions (e.g. AVX vs SSE) of the same Eigen code? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi,
On 28.01.2015 15:51, Benoit Jacob wrote:
There may be other compile flags that break the ABI (I don't see any out of
hand, which one do you have in mind?), but this (the flags to control SIMD)
Examples are EIGEN_DEFAULT_TO_ROW_MAJOR (which also breaks the API --
and is therefore discouraged to be used, anyways), and of course
EIGEN_DONT_ALIGN (which is related to our current discussion).
Regarding 32byte alignment, I like the idea of the MAX_ALIGN_*-flags.
For the default behavior, I basically see three alternatives:
1) Always make 16 byte default, warn users who compile for AVX but don't
have 32byte alignment (the warning can be silenced by explicitly
compiling with 16 or 32 byte alignment)
2) Always make 32 byte the default -- this wastes memory (and
cache-space) for non-AVX users. Also, it breaks ABI compatibility to the
current code.
3) Automatically use 32 byte alignment when AVX is enabled. Warn
prominently in the documentation about ABI incompatibility.
I guess at least 95% users don't mind ABI incompatibilities, but prefer
best performance for their architecture (therefore compile all TUs with
the same flags anyways). So 3) would likely satisfy most users.
Compiling different TUs with and without AVX appears to be non-trivial
anyways and mixing SSE code with AVX code also results in bad
performance due to context switches, so we can expect that users who
want that, to at least read the documentation (or ask for help) if they
fail.
Overall, I think 1) would be the safest, at the cost of annoying many
users (once) when they switch to AVX and 3) would be the "convenient for
most"-solution.
is a special case because it is very common for people to want to compile
the same code with different values for these flags and choose at runtime
between these code paths, which becomes very tricky and sub-optimal if the
ABI is not the same.
Yes, that's an important use case for any "real-world" application
(distributed as binary). I think we can give these users the burdon of
having to read the documentation, how to achieve this (option 3).
I guess most Eigen users are in R&D (I could be wrong) and are used to
(re-)compile their entire source tree depending on the architecture,
anyways.
Regardless, to the extent that it's true that *static* 32byte alignment is
important for performance, I'm OK to treat this as a documentation issue
and default to breaking the ABI, with sufficient warnings/documentation.
I was just wonder to what extent that was the case: static 32byte alignment
is irrelevant to 1) dynamic-size matrices, and 2) the most important cases
of fixed-size vectorizability (Vector4f, Matrix4f). But, sure, the
Matrix4f is an important question. Are there AVX instructions to make it
worth aligning it to 32 bytes? If so, also for operations such as
Matrix4f * Vector4f?
An analogue question is if we can profit from vectorization for Matrix2f
and SSE. E.g., a Matrix2f*Matrix2f product could be done with some
shuffling, two pmuls and one padd (last time I checked, this product was
not vectorized). Also, Matrix2f*Vector2f should be possible with some
shuffling, one pmul and one hadd (and then storing only 8bytes of the
result vector)
Christoph
--
----------------------------------------------
Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen
Tel: +49 (421) 218-64252
----------------------------------------------