Re: [eigen] help speeding up an expression? |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] help speeding up an expression?*From*: Dick Lyon <dicklyon@xxxxxxxxxx>*Date*: Sat, 3 Aug 2013 08:05:50 -0700*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=k0H6ChjWUjAaDxkBpuce2xeHPIgX/LjaYRTpJC9ix4w=; b=HoxxVfCkTRpwm72MuGPys+/gMUdcXJib2ESGRSpTt0bpCwzZcJHvu5j1hOror98BSP ala2emfPdWK64XId37mX4vknRK7KPkVUG5xemuxJLz74+JrkPj7/EUmA23/dVeMA3PQW nX4ciuX0QmdQCstyowzYBJvqRT0hFUk/bNq4AFHu266s77p/yQqyBxaRQLU9OlEUp5De 7Bh1F0zaIqCnzBDxY7VHv+VcGJf6TrU1VrfVb72ftt4icHd1+d6LnIaF1f0R5vJ+636l g18OpE7hXY4sXoQZ9HTY6l7DE55P4FsLZT8VQtrU+ZW2VLuV4rOcM90QzDc6VAiH9y3O poiA==

It turned out the select was the real slow part.

Doing z = (z + 0.175).max(0) before the rational function was much faster.

I'll look into unary inverse and using temps for x*x etc.

Writing custom SIMD code for various platforms would completely negate the point of using eigen.

Dick

On Sat, Aug 3, 2013 at 7:54 AM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:

On 02.08.2013 23:25, Dick Lyon wrote:I assume the bottleneck is the division here. I guess we could improve Eigen to use SSE's fast inverse with refinement for float divisions if FAST_MATH is enabled.

z += 0.175;

z = (z < 0).select(0.0, (z*z*z) / (z*z*z + z*z + 0.1));

There might be further small improvements, like re-using the z*z and z*z*z part (but this should be done on the fly, without a big temporary array). And a very small improvement: (z<0).select(0,x) could be replaced e.g. by something like ((z>=0.0) & x).

If you really want to fix that bottleneck, I recommend to implement a custom unary _expression_ with hand-optimized SIMD code.I doubt that will help a lot.

Can this be sped up by clever use of pre-allocated temp arrays?

Currently it's not vectorized, so you lose some time there as well (but most likely it will be optimized in future Eigen versions).

Is select fast? Or are there better ways to threshold things?

I don't have any ideas about NEON's SIMD capacity.

I'm on an Intel platform, but ultimately also targeting Arm NEON, in case

that matters.

Regards,

Christoph

--

----------------------------------------------

Dipl.-Inf., Dipl.-Math. Christoph Hertzberg

Cartesium 0.049

Universität Bremen

Enrique-Schmidt-Straße 5

28359 Bremen

Tel: +49 (421) 218-64252

----------------------------------------------

**References**:**[eigen] help speeding up an expression?***From:*Dick Lyon

**Re: [eigen] help speeding up an expression?***From:*Christoph Hertzberg

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] help speeding up an expression?** - Next by Date:
**Re: [eigen] Array, Matrix, PlainObjectBase and DenseStorage are now move enabled** - Previous by thread:
**Re: [eigen] help speeding up an expression?** - Next by thread:
**[eigen] broken link in doc**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |