Re: [eigen] Non-optimal sse assembly code with gcc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Non-optimal sse assembly code with gcc
From: Mark Borgerding <mark@xxxxxxxxxxxxxx>
Date: Mon, 23 Jan 2012 15:15:25 -0500

DAZ and FTZ makes a big speedup on intel machines, since handlingdenormals is really slow (and often only propagates rounding errors).

Enabling these modes is something Eigen users can do by themselves, butit might be worth a mention in some documentation somewhere.

    _mm_setcsr( _mm_getcsr() | (1<<15));  // FTZ
    _mm_setcsr( _mm_getcsr() | (1<<6)); // DAZ



On 01/23/2012 07:19 AM, Christoph Hertzberg wrote:

On 23.01.2012 12:54, Benjamin Schindler wrote:
I would love to know whether your rationale on denormalized numbers is
correct though. Does anybody else know exactly?
I found this article (it is 3 years old, though):
http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/
Christoph

References:
- [eigen] Non-optimal sse assembly code with gcc
  - From: Benjamin Schindler
- Re: [eigen] Non-optimal sse assembly code with gcc
  - From: Christoph Hertzberg
- Re: [eigen] Non-optimal sse assembly code with gcc
  - From: Benjamin Schindler
- Re: [eigen] Non-optimal sse assembly code with gcc
  - From: Christoph Hertzberg

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Dynamic allocations in SelfAdjointEigenSolver and HouseholderQR
Next by Date: Re: [eigen] Dynamic allocations in SelfAdjointEigenSolver and HouseholderQR
Previous by thread: Re: [eigen] Non-optimal sse assembly code with gcc
Next by thread: [eigen] Eigen and Visualization Toolkit (VTK)

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/