On Fri, Mar 27, 2009 at 8:32 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote: > This file has my sse float implementation for square root. The SSE > square root instruction has only 12 bits of precision so extra where did you find sqrtss or sqrtps has only 12 bits of precision ? this is right for the reciprocal functions (rcpps and rsqrtps) but I thought sqrtps was ieee754 compliant. For instance I found that in the glibc: float __ieee754_sqrtf (float x) { float res; asm ("sqrtss %0, %1" : "=x" (res) : "x" (x)); return res; } anyway, sqrtps is very slow and I'm sure there exist a faster alternative. Perhaps a rsqrtps followed by a div and Newton Raphson iterations.. > iterations of Newton Raphson may be neccessary. How many of the are > neccessary, I don't know. the max error I was getting was O(1e-8) in > [0,1]. The cephes implementation has square root only for limited > range. They do some other hacks to take care of range. I'll look into > implementing those later. For now, this should be an acceptable for > the fast implementations of square root atleast. > Regards, > > -- > Rohit Garg > > http://rpg-314.blogspot.com/ > > Senior Undergraduate > Department of Physics > Indian Institute of Technology > Bombay >

