Re: [eigen] sse asin implementation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


It's about 2x faster which is expected. sse means a 4 asins per call
but we compute both branches for all vectors so it's effectively half
of that.

On Wed, Apr 1, 2009 at 7:56 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> I forgot: do you have performance comparisons for your vectorized
> version of asin ?
>
> On Wed, Apr 1, 2009 at 4:25 PM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxx> wrote:
>> hi,
>>
>> let me remind that currently the packet versions of sin, cos, exp, log
>> and sqrt are enabled by default (regardless of the fast-math option).
>> The vectorized version of sin, cos and sqrt can be disabled by
>> defining a preprocessor token. If there are good argument I'm still ok
>> to change this behavior for "disabled by default" and "enabled if
>> either EIGEN_FAST_MATH or _FAST_MATH_ are defined".
>>
>> about the vectorization of asin, acos, etc, indeed, I don't see many
>> use cases for a "vec.cwise().asin()" , but I think they can be useful
>> to vectorize (by hand) more complex algorithms. Perhaps, a good
>> compromise would be to put them in an "ExtraVectorization" module in
>> unsupported/ and move some of them to the official Array module
>> according to the respective demands.
>>
>> gael.
>>
>> On Tue, Mar 31, 2009 at 6:57 PM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>>> On Tue, Mar 31, 2009 at 10:19 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>>> 2009/3/31 Rohit Garg <rpg.314@xxxxxxxxx>:
>>>>> For general exponentiation, I think a cheap and easy route would be
>>>>> just exp(b*log(a))
>>>>>
>>>>> I looked at the cephes library implementation of pow and they do a lot
>>>>> of hacks just to get 3 bits of precision. In the use case that you
>>>>> have cited, I think it's better to just have the above implementation.
>>>>> It is certainly good enough for the -ffast-math case.
>>>>
>>>> OK, seems sensible. But I thought the main justification for a simd
>>>> pow() would be performance, that it could be faster than the above
>>>> formula. If that's not the case then sure, I agree with you.
>>>
>>> ATM, I don't know of any way to do that without going the log ->exp
>>> route. cephes does something without taking a full blown log and an
>>> exp but the general approach is same. But yes, we'll have to flush the
>>> denormals if we want any real performance benefit. Perhaps this is a
>>> candidate to be vectorized only if the fast-math is chosen. And
>>> looking at the log code already in eigen, ATM the user has no way to
>>> select the default route if he wants the denormals to be treated with
>>> respect. Silently killing them is suitable only if the -ffast-math is
>>> given
>>>
>>>>
>>>>>
>>>>> BTW, what do you think of enabling the fast math paths in eigen when
>>>>> just -ffast-math is supplied to gcc. It can be detected by the
>>>>> __FAST_MATH__ macro.
>>>>
>>>> That seems like a good idea! Let's see what Gael thinks.
>>>>
>>>> Cheers,
>>>> Benoit
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Rohit Garg
>>>
>>> http://rpg-314.blogspot.com/
>>>
>>> Senior Undergraduate
>>> Department of Physics
>>> Indian Institute of Technology
>>> Bombay
>>>
>>>
>>>
>>
>
>
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/