Re: [eigen] SSE square root

[ Thread Index | Date Index | More Archives ]

On Fri, Mar 27, 2009 at 8:32 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> This file has my sse float implementation for square root. The SSE
> square root instruction has only 12 bits of precision so extra

where did you find sqrtss or sqrtps has only 12 bits of precision ?

this is right for the reciprocal functions (rcpps and rsqrtps) but I
thought sqrtps was ieee754 compliant. For instance I found that in the

__ieee754_sqrtf (float x)
  float res;

  asm ("sqrtss %0, %1" : "=x" (res) : "x" (x));

  return res;

anyway, sqrtps is very slow and I'm sure there exist a faster
alternative. Perhaps a rsqrtps followed by a div and Newton Raphson

> iterations of Newton Raphson may be neccessary. How many of the are
> neccessary, I don't know. the max error I was getting was O(1e-8) in
> [0,1]. The cephes implementation has square root only for limited
> range. They do some other hacks to take care of range. I'll look into
> implementing those later. For now, this should be an acceptable for
> the fast implementations of square root atleast.

> Regards,
> --
> Rohit Garg
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay

Mail converted by MHonArc 2.6.19+