Re: [eigen] help speeding up an expression? |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
On 02.08.2013 23:25, Dick Lyon wrote:I assume the bottleneck is the division here. I guess we could improve Eigen to use SSE's fast inverse with refinement for float divisions if FAST_MATH is enabled.
z += 0.175;
z = (z < 0).select(0.0, (z*z*z) / (z*z*z + z*z + 0.1));
There might be further small improvements, like re-using the z*z and z*z*z part (but this should be done on the fly, without a big temporary array). And a very small improvement: (z<0).select(0,x) could be replaced e.g. by something like ((z>=0.0) & x).
If you really want to fix that bottleneck, I recommend to implement a custom unary _expression_ with hand-optimized SIMD code.I doubt that will help a lot.
Can this be sped up by clever use of pre-allocated temp arrays?
Currently it's not vectorized, so you lose some time there as well (but most likely it will be optimized in future Eigen versions).
Is select fast? Or are there better ways to threshold things?
I don't have any ideas about NEON's SIMD capacity.
I'm on an Intel platform, but ultimately also targeting Arm NEON, in case
that matters.
Regards,
Christoph
--
----------------------------------------------
Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen
Tel: +49 (421) 218-64252
----------------------------------------------
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |