Re: [eigen] SSE square root |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] SSE square root
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Fri, 27 Mar 2009 11:28:15 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=XNT1VGornaeLJWmBgavyER05Ezq8xsp+sqQLwKD+2bg=; b=nAy4PlZdaTtATucmqGBIFlQyxj0waSfhqjoAnGXln0iJRGhvaIntkEC6WE42ywFWw8 jm5lS9uZj0z9Ohfjf5vO524r+C/MbGXU62Xo8Jj3FecsdJUvBeja98iYH+lXe8A7hG4Q I1zHnrbFmpljQtQUDG+AOINzmowpn5L4EaNOQ=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=FGq5FMjemKA9el6Kyn7DYK8IczK8OVgbjZn9l/NoZg8tTvU4XiPAUGqBs79hhUMHqQ pJSXtez3ZqvFTgj6XQC2X0TRc9NyYuI04l4g012/nwjjw7GPWfrbZc61jqYAMyT7nzyo p6iJOBDjh30cJKo5vYr339/9ODY7vJNmJcb0I=
On Fri, Mar 27, 2009 at 8:32 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> This file has my sse float implementation for square root. The SSE
> square root instruction has only 12 bits of precision so extra
where did you find sqrtss or sqrtps has only 12 bits of precision ?
this is right for the reciprocal functions (rcpps and rsqrtps) but I
thought sqrtps was ieee754 compliant. For instance I found that in the
glibc:
float
__ieee754_sqrtf (float x)
{
float res;
asm ("sqrtss %0, %1" : "=x" (res) : "x" (x));
return res;
}
anyway, sqrtps is very slow and I'm sure there exist a faster
alternative. Perhaps a rsqrtps followed by a div and Newton Raphson
iterations..
> iterations of Newton Raphson may be neccessary. How many of the are
> neccessary, I don't know. the max error I was getting was O(1e-8) in
> [0,1]. The cephes implementation has square root only for limited
> range. They do some other hacks to take care of range. I'll look into
> implementing those later. For now, this should be an acceptable for
> the fast implementations of square root atleast.
> Regards,
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
>