Re: [eigen] Non-optimal sse assembly code with gcc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


DAZ and FTZ makes a big speedup on intel machines, since handling denormals is really slow (and often only propagates rounding errors).

Enabling these modes is something Eigen users can do by themselves, but it might be worth a mention in some documentation somewhere.
    _mm_setcsr( _mm_getcsr() | (1<<15));  // FTZ
    _mm_setcsr( _mm_getcsr() | (1<<6)); // DAZ



On 01/23/2012 07:19 AM, Christoph Hertzberg wrote:
On 23.01.2012 12:54, Benjamin Schindler wrote:
I would love to know whether your rationale on denormalized numbers is
correct though. Does anybody else know exactly?

I found this article (it is 3 years old, though):
http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/

Christoph







Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/