Re: [eigen] (General question) Floating point: why are 'inf' and 'nan' slow?

[ Thread Index | Date Index | More Archives ]

On Wed, Sep 23, 2009 at 9:12 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/9/23 Rohit Garg <rpg.314@xxxxxxxxx>:
>> On Wed, Sep 23, 2009 at 8:17 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>> Ah, passing:
>>> -mfpmath=sse -msse2 -DEIGEN_DONT_VECTORIZE
>>> does fix the problem (I pass -DEIGEN_DONT_VECTORIZE because I already
>>> knew that SIMD instructions like mulps avoids the problem; now I can
>>> see that indeed scalar instructions like mulss also avoid the
>>> problem).
>>> So, the problem is a non-issue on SSE2-capable systems (SSE2, because
>>> SSE doesn't support double).
>>> But what about non-SSE2-capable systems, or simply linux distros who
>>> need to build a generic i686 binary package.... are they out of luck?
>>> The big design decision that I am facing now is this: floating point
>>> numbers claim to be able to represent special values such as "inf" and
>>> "nan"; ideally we would play this game, returning "inf" when that is
>>> the natural result given the user's input; but if that must be 100x
>>> slower than normal even before the user has any chance of checking if
>>> that's happening, then in practice we can't do that and we need to
>>> explicitly avoid generating "inf" and "nan" values even when that
>>> would be the natural result given the user's input.
>> At any rate, eigen needs
>> to focus on the future, and it is x86-64.
> No, that doesn't work, for 2 separate reasons:
> Reason 1: What about all the embedded CPUs and low-power CPUs out
> there, I'm sure that many of them have the same issues.
>  -- if, as Jitse's link suggests, the problems are inherent in the
> design of the x87, then all x87-compatible non-SSE-capable CPUs will
> have the same problem. That's a lot of embedded CPUs.
>  -- what about the Intel Atom... etc, etc.

Atom has SSE, SSE2, SSE3, SSSE3. In embedded cpu's, situation is much
nicer actually. You know the hardware, so such problems can be
tackled. The only cpu's which don't have any sse are pentium 2 and
older. Are you sure you want to worry about machines running that old
> Reason 2: Linux distros aren't going to drop support for i686 (some
> even still support i586) anytime soon, we can't change that, and
> that's all the more going to continue with the current trend of
> "netbooks". Plus, it's legitimate to want to continue using old
> machines.

Netbooks have SSEx. see here.

We cannot change that 32 bit will be supported for a while, but inf,
nan, denormals are a corner case really, and this will make eigen
perhaps the only math project out there that fights the default
behavior of cpu's. If you absolutely must, provide a compile time flag
(it should be opt in, not opt out),  but please, please don't break
numerics that shock the hell out of every one and his brother.

I'll go out on a limb and say that the majority of programmers don't
know about these special fp numbers, so it REALLY is a corner case. It
is not a biggie.

This way, the user will NEVER know if something went wrong with his
>> Those who use prepackaged software for 32 bit can still make 2
>> codepaths, detecting CPU at runtime. This is how EVERY body else does
>> it,and it has worked out pretty well so far.
> ....and the x87 code path would be generated how? If we designed Eigen
> without x87 in mind, making x87 as much as 850x slower than it should
> be (Jitse's link), then that code path would have to be generated
> using another library? That doesn't work !

If you *expect* to run into these special numbers more than 1% of the
time, then you don't need a separate library, you need a new
algorithm, period. And for less than 1%, we dont need to break
numerics like this.

>> No, we should return inf and nan wherever needed. Reason being inf and
>> nan usually signal errors in data/algorithm. Not returning them at all
>> will be a BAD idea.
> Yes, if it's just a matter of returning them, why not. But my dilemmas
> start when there are situations when INF may happen in the middle of
> the computation and one would still have to do the rest of the
> computation with INF values. That is not reasonable if INF goes 850x
> slower.
> Benoit

Rohit Garg

Senior Undergraduate
Department of Physics
Indian Institute of Technology

Mail converted by MHonArc 2.6.19+