Re: [eigen] help speeding up an expression?

[ Thread Index | Date Index | More Archives ]

It turned out the select was the real slow part.  
Doing z = (z + 0.175).max(0)  before the rational function was much faster.

I'll look into unary inverse and using temps for x*x etc.

Writing custom SIMD code for various platforms would completely negate the point of using eigen.


On Sat, Aug 3, 2013 at 7:54 AM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On 02.08.2013 23:25, Dick Lyon wrote:
   z += 0.175;
   z = (z < 0).select(0.0, (z*z*z) / (z*z*z + z*z + 0.1));

I assume the bottleneck is the division here. I guess we could improve Eigen to use SSE's fast inverse with refinement for float divisions if FAST_MATH is enabled.

There might be further small improvements, like re-using the z*z and z*z*z part (but this should be done on the fly, without a big temporary array). And a very small improvement: (z<0).select(0,x) could be replaced e.g. by something like ((z>=0.0) & x).

If you really want to fix that bottleneck, I recommend to implement a custom unary _expression_ with hand-optimized SIMD code.

Can this be sped up by clever use of pre-allocated temp arrays?

I doubt that will help a lot.

Is select fast?  Or are there better ways to threshold things?

Currently it's not vectorized, so you lose some time there as well (but most likely it will be optimized in future Eigen versions).

I'm on an Intel platform, but ultimately also targeting Arm NEON, in case
that matters.

I don't have any ideas about NEON's SIMD capacity.


Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252

Mail converted by MHonArc 2.6.19+