Re: [eigen] Performance difference icc <-> gcc, EIGEN_STRONG_INLINE |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
$ icpc -help inline
Inlining
--------
-inline-level=<n>
control inline expansion:
n=0 disable inlining
n=1 inline functions declared with __inline, and perform C++
inlining
n=2 inline any function, at the compiler's discretion
-f[no-]inline
inline functions declared with __inline, and perform C++ inlining
-f[no-]inline-functions
inline any function at the compiler's discretion
-finline-limit=<n>
set maximum number of statements a function can have and still be
considered for inlining
-fgnu89-inline
use C89 semantics for "inline" functions when in C99 mode
-inline-min-size=<n>
set size limit for inlining small routines
-no-inline-min-size
no size limit for inlining small routines
-inline-max-size=<n>
set size limit for inlining large routines
-no-inline-max-size
no size limit for inlining large routines
-inline-max-total-size=<n>
maximum increase in size for inline function expansion
-no-inline-max-total-size
no size limit for inline function expansion
-inline-max-per-routine=<n>
maximum number of inline instances in any function
-no-inline-max-per-routine
no maximum number of inline instances in any function
-inline-max-per-compile=<n>
maximum number of inline instances in the current compilation
-no-inline-max-per-compile
no maximum number of inline instances in the current compilation
-inline-factor=<n>
set inlining upper limits by n percentage
-no-inline-factor
do not set set inlining upper limits
-inline-forceinline
treat inline routines as forceinline
-inline-calloc
directs the compiler to inline calloc() calls as malloc()/memset()
-inline-min_caller-growth=<n>
set lower limit on caller growth due to inlining a single routine
-no-inline-min-caller-growth
no lower limit on caller growth due to inlining a single routine
Hello all,
Thank you very much for your work on Eigen. We found it very useful for
our simulation software mbsolve [1] (BTW maybe you would like to add it
to the projects list that uses the Eigen library).
The code I am working on at the moment consists mostly of dense
matrix-matrix and matrix-vector multiplications. I compiled the code
with both Intel compiler 19 and gcc 6.3.0 and found that there is a
strange performance difference. Unless I define
#EIGEN_STRONG_INLINE inline
the binary compiled by icc is ~13x slower. The gcc binary performance
remains the same, as inline seems to be the standard setting of this
macro for gcc.
Why can this behavior occur? Or, alternatively, which possible
anti-pattern could be the cause of this performance difference?
Any hints are welcome. If you need more information, please let me know.
Thanks in advance and best regards,
Michael
[1] https://github.com/mriesch-tum/mbsolve
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |