Re: [eigen] (General question) Floating point: why are 'inf' and 'nan' slow? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] (General question) Floating point: why are 'inf' and 'nan' slow?
- From: Rohit Garg <rpg.314@xxxxxxxxx>
- Date: Wed, 23 Sep 2009 22:53:06 +0530
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=GkQONN5UkwqkjdDpTEiwZDQvNnpQX5fcBXBUEJQVyNQ=; b=qlJyjQNGp9zxT3c8p6kek8/+gubnoBEtEwOXXy/fiJrDct4V4irBMsIg7myll51CLP j66dfE+NyamIseMBfEX2opdcw3bi1m8ZQTKBgHGfrAuIkF5mx6aMBsrT4JZPJ332Q/0g ppUoAVqUZ3OMw2LGyn4JrEP2oLXuyqproRL8w=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=v3bdoE4JaCZGmqVE2RtFFyLhYAlmopi/Hox1360LD9xArW2YMGNwiCr0LkgKYawBrm 2Ij9cxw4N+Yt2TZ0GanvT7EsyiHF+O85cLPPTMsl7+8wmssKPOTBz51TP070m9eJO77O tCBd1R4bYSy+Pt6puTMro1Mx3A1GAEMPHCT4g=
On Wed, Sep 23, 2009 at 10:27 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> The list of x86 CPUs that don't have SSE2 (SSE is not enough for
> double) includes Pentium 3, Athlon XP, VIA C7, AMD Geode, etc. There's
> no way that we could neglect all of them performance-wise. Moreover,
> even with SSE2, people may still want to use -mfpmath=387 (and it's
> the default) in which case the non-vectorized part of Eigen
> computations is affected.
>
> It's not a corner case at all, I was wondering if when redesigning the
> solvers I could assume it to be safe to produce INF and NAN during the
> computation, not just as return values at the end, and the answer is
> that I can't do that.
How do you propose to handle this then? Will you check for values
before writing them to memory? And what happens on the SSE2 machines?
There too, denormals are handled in sw.
>
> Incidentally, looking at LAPACK, they don't do that either (in DGESVX
> they even give up the computation when any pivot is exactly zero).
In this case, giving up is the only thing that can be done sensibly, I agree.
>
> Benoit
>
>
> 2009/9/23 Rohit Garg <rpg.314@xxxxxxxxx>:
>> On Wed, Sep 23, 2009 at 9:12 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>> 2009/9/23 Rohit Garg <rpg.314@xxxxxxxxx>:
>>>> On Wed, Sep 23, 2009 at 8:17 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>>>> Ah, passing:
>>>>> -mfpmath=sse -msse2 -DEIGEN_DONT_VECTORIZE
>>>>>
>>>>> does fix the problem (I pass -DEIGEN_DONT_VECTORIZE because I already
>>>>> knew that SIMD instructions like mulps avoids the problem; now I can
>>>>> see that indeed scalar instructions like mulss also avoid the
>>>>> problem).
>>>>>
>>>>> So, the problem is a non-issue on SSE2-capable systems (SSE2, because
>>>>> SSE doesn't support double).
>>>>>
>>>>> But what about non-SSE2-capable systems, or simply linux distros who
>>>>> need to build a generic i686 binary package.... are they out of luck?
>>>>>
>>>>> The big design decision that I am facing now is this: floating point
>>>>> numbers claim to be able to represent special values such as "inf" and
>>>>> "nan"; ideally we would play this game, returning "inf" when that is
>>>>> the natural result given the user's input; but if that must be 100x
>>>>> slower than normal even before the user has any chance of checking if
>>>>> that's happening, then in practice we can't do that and we need to
>>>>> explicitly avoid generating "inf" and "nan" values even when that
>>>>> would be the natural result given the user's input.
>>>>
>>>
>>>> At any rate, eigen needs
>>>> to focus on the future, and it is x86-64.
>>>
>>> No, that doesn't work, for 2 separate reasons:
>>>
>>> Reason 1: What about all the embedded CPUs and low-power CPUs out
>>> there, I'm sure that many of them have the same issues.
>>> -- if, as Jitse's link suggests, the problems are inherent in the
>>> design of the x87, then all x87-compatible non-SSE-capable CPUs will
>>> have the same problem. That's a lot of embedded CPUs.
>>> -- what about the Intel Atom... etc, etc.
>>
>> Atom has SSE, SSE2, SSE3, SSSE3. In embedded cpu's, situation is much
>> nicer actually. You know the hardware, so such problems can be
>> tackled. The only cpu's which don't have any sse are pentium 2 and
>> older. Are you sure you want to worry about machines running that old
>> cpu's?
>>>
>>> Reason 2: Linux distros aren't going to drop support for i686 (some
>>> even still support i586) anytime soon, we can't change that, and
>>> that's all the more going to continue with the current trend of
>>> "netbooks". Plus, it's legitimate to want to continue using old
>>> machines.
>>
>> Netbooks have SSEx. see here.
>>
>> http://www.opensubscriber.com/message/discuss-gnuradio@xxxxxxx/11108339.html
>>
>> We cannot change that 32 bit will be supported for a while, but inf,
>> nan, denormals are a corner case really, and this will make eigen
>> perhaps the only math project out there that fights the default
>> behavior of cpu's. If you absolutely must, provide a compile time flag
>> (it should be opt in, not opt out), but please, please don't break
>> numerics that shock the hell out of every one and his brother.
>>
>> I'll go out on a limb and say that the majority of programmers don't
>> know about these special fp numbers, so it REALLY is a corner case. It
>> is not a biggie.
>>
>> This way, the user will NEVER know if something went wrong with his
>> algorithm/coding/data.
>>>
>>>> Those who use prepackaged software for 32 bit can still make 2
>>>> codepaths, detecting CPU at runtime. This is how EVERY body else does
>>>> it,and it has worked out pretty well so far.
>>>
>>> ....and the x87 code path would be generated how? If we designed Eigen
>>> without x87 in mind, making x87 as much as 850x slower than it should
>>> be (Jitse's link), then that code path would have to be generated
>>> using another library? That doesn't work !
>>
>> If you *expect* to run into these special numbers more than 1% of the
>> time, then you don't need a separate library, you need a new
>> algorithm, period. And for less than 1%, we dont need to break
>> numerics like this.
>>
>>>
>>>> No, we should return inf and nan wherever needed. Reason being inf and
>>>> nan usually signal errors in data/algorithm. Not returning them at all
>>>> will be a BAD idea.
>>>
>>> Yes, if it's just a matter of returning them, why not. But my dilemmas
>>> start when there are situations when INF may happen in the middle of
>>> the computation and one would still have to do the rest of the
>>> computation with INF values. That is not reasonable if INF goes 850x
>>> slower.
>>>
>>> Benoit
>>>
>>>
>>>
>>
>>
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Senior Undergraduate
>> Department of Physics
>> Indian Institute of Technology
>> Bombay
>>
>>
>>
>
>
>
--
Rohit Garg
http://rpg-314.blogspot.com/
Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay