Re: [eigen] Performance difference icc <-> gcc, EIGEN_STRONG_INLINE

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Michael,

Can you provide some details on the compiler flags you used?  If you can give me the commands required to reproduce your results, I will submit a bug report to ICC (I work for Intel).

I suspect that ICC is using different (more conservative) inlining heuristic than GCC.  Failure to inline probably explains the 13x.  However, that doesn't mean that GCC is right and ICC is wrong, as there is no perfect inlining heuristic (see e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 for a contrary position on GCC heuristics).  Nonetheless, it may be useful to have the ICC inline heuristic aware of Eigen use cases, since Eigen is rather popular.

Fortunately, it seems that there is a good solution already, which is to use EIGEN_STRONG_INLINE, which obviously causes ICC to inline much more aggressively.

Best,

Jeff

PS In the unlikely event that EIGEN_STRONG_INLINE isn't sufficient, you may find the following ICC options useful.

$ icpc -help inline

 

Inlining

--------

 

-inline-level=<n>

          control inline expansion:

            n=0  disable inlining

            n=1  inline functions declared with __inline, and perform C++

                 inlining

            n=2  inline any function, at the compiler's discretion 

-f[no-]inline

          inline functions declared with __inline, and perform C++ inlining

-f[no-]inline-functions

          inline any function at the compiler's discretion

-finline-limit=<n>

          set maximum number of statements a function can have and still be

          considered for inlining

-fgnu89-inline

           use C89 semantics for "inline" functions when in C99 mode

-inline-min-size=<n>

          set size limit for inlining small routines

-no-inline-min-size

          no size limit for inlining small routines

-inline-max-size=<n>

          set size limit for inlining large routines

-no-inline-max-size

          no size limit for inlining large routines

-inline-max-total-size=<n>

          maximum increase in size for inline function expansion

-no-inline-max-total-size

          no size limit for inline function expansion

-inline-max-per-routine=<n>

          maximum number of inline instances in any function

-no-inline-max-per-routine

          no maximum number of inline instances in any function

-inline-max-per-compile=<n>

          maximum number of inline instances in the current compilation

-no-inline-max-per-compile

          no maximum number of inline instances in the current compilation

-inline-factor=<n>

          set inlining upper limits by n percentage

-no-inline-factor

          do not set set inlining upper limits

-inline-forceinline

          treat inline routines as forceinline

-inline-calloc

          directs the compiler to inline calloc() calls as malloc()/memset()

-inline-min_caller-growth=<n>

          set lower limit on caller growth due to inlining a single routine

-no-inline-min-caller-growth

          no lower limit on caller growth due to inlining a single routine

 


On Wed, Mar 13, 2019 at 11:34 AM Michael Riesch <michael.riesch@xxxxxx> wrote:
Hello all,

Thank you very much for your work on Eigen. We found it very useful for
our simulation software mbsolve [1] (BTW maybe you would like to add it
to the projects list that uses the Eigen library).

The code I am working on at the moment consists mostly of dense
matrix-matrix and matrix-vector multiplications. I compiled the code
with both Intel compiler 19 and gcc 6.3.0 and found that there is a
strange performance difference. Unless I define

#EIGEN_STRONG_INLINE inline

the binary compiled by icc is ~13x slower. The gcc binary performance
remains the same, as inline seems to be the standard setting of this
macro for gcc.

Why can this behavior occur? Or, alternatively, which possible
anti-pattern could be the cause of this performance difference?

Any hints are welcome. If you need more information, please let me know.

Thanks in advance and best regards,
Michael

[1] https://github.com/mriesch-tum/mbsolve






--


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/