Re: [eigen] SSE square root |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] SSE square root*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Fri, 27 Mar 2009 11:36:46 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=0TMOhzN9YJwdjIE8kK/bfvWBvxTUcXiDF3AZ9NxwGS8=; b=YOvHfoKUUIo4cDxP79riDkVMlt2goukMAANBgRonMeUBvyHOZbASnoQESukITAmtaw sXvUVqzq2ILvBElkQV2gDAdMc6M/juuta27/kWt/hPtrfvsBa9KUL7GdMNqtUme5f47r Sh6yas3HzYf3joIbsWB0aKiTAaHghSI8VE1Mo=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=lC9BsJqkmXRHp8oHMvXyjnEOb8kBx98ybQANfhIH21YAmY6eIoqf6Limt3xNN1qrnx iWXp3yNeNBaM6oj1930oqqpor6lERQtyQuh3ExRhU4OBJqpJnwOi16BI+K3qlmvpoMy1 ExYe5SLDzbP1MlZ3X5iRfo7X2LtzoOQvL6GVs=

On Fri, Mar 27, 2009 at 11:28 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote: > On Fri, Mar 27, 2009 at 8:32 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote: >> This file has my sse float implementation for square root. The SSE >> square root instruction has only 12 bits of precision so extra > > where did you find sqrtss or sqrtps has only 12 bits of precision ? > > this is right for the reciprocal functions (rcpps and rsqrtps) but I > thought sqrtps was ieee754 compliant. For instance I found that in the > glibc: > > float > __ieee754_sqrtf (float x) > { > float res; > > asm ("sqrtss %0, %1" : "=x" (res) : "x" (x)); > > return res; > } > > anyway, sqrtps is very slow and I'm sure there exist a faster > alternative. Perhaps a rsqrtps followed by a div and Newton Raphson > iterations.. hm.. not a div but a mul: x = _mm_mul_ps(_mm_rsqrt_ps(x),x); // iterations also, on my architecture, sqrtf(x) is compiled as a SSE sqrtss instruction..... > >> iterations of Newton Raphson may be neccessary. How many of the are >> neccessary, I don't know. the max error I was getting was O(1e-8) in >> [0,1]. The cephes implementation has square root only for limited >> range. They do some other hacks to take care of range. I'll look into >> implementing those later. For now, this should be an acceptable for >> the fast implementations of square root atleast. > > > >> Regards, >> >> -- >> Rohit Garg >> >> http://rpg-314.blogspot.com/ >> >> Senior Undergraduate >> Department of Physics >> Indian Institute of Technology >> Bombay >> >

**References**:**[eigen] SSE square root***From:*Rohit Garg

**Re: [eigen] SSE square root***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] SSE square root** - Next by Date:
**Re: [eigen] SSE square root** - Previous by thread:
**Re: [eigen] SSE square root** - Next by thread:
**Re: [eigen] SSE square root**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |