Re: [eigen] SSE square root

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Fri, Mar 27, 2009 at 11:28 AM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Fri, Mar 27, 2009 at 8:32 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>> This file has my sse float implementation for square root. The SSE
>> square root instruction has only 12 bits of precision so extra
>
> where did you find sqrtss or sqrtps has only 12 bits of precision ?
>
> this is right for the reciprocal functions (rcpps and rsqrtps) but I
> thought sqrtps was ieee754 compliant. For instance I found that in the
> glibc:
>
> float
> __ieee754_sqrtf (float x)
> {
>  float res;
>
>  asm ("sqrtss %0, %1" : "=x" (res) : "x" (x));
>
>  return res;
> }
>
> anyway, sqrtps is very slow and I'm sure there exist a faster
> alternative. Perhaps a rsqrtps followed by a div and Newton Raphson
> iterations..

hm.. not a div but a mul:

x = _mm_mul_ps(_mm_rsqrt_ps(x),x);
// iterations

also, on my architecture, sqrtf(x) is compiled as a SSE sqrtss instruction.....

>
>> iterations of Newton Raphson may be neccessary. How many of the are
>> neccessary, I don't know. the max error I was getting was O(1e-8) in
>> [0,1]. The cephes implementation has square root only for limited
>> range. They do some other hacks to take care of range. I'll look into
>> implementing those later. For now, this should be an acceptable for
>> the fast implementations of square root atleast.
>
>
>
>> Regards,
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Senior Undergraduate
>> Department of Physics
>> Indian Institute of Technology
>> Bombay
>>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/