[eigen] SSE square root

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: [eigen] SSE square root
From: Rohit Garg <rpg.314@xxxxxxxxx>
Date: Fri, 27 Mar 2009 13:02:05 +0530
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=uO4pVLo2O9uxv6KWcwkufAuGtH8slN1iq//m41jDE+Y=; b=KIB+qb+yLJyOSvCJYv88Ux4y8l0EqshuPQlfqcIsRnOHYM7Se0hwB/IZXTnUhbMbxE z/5tjeGtM6YAFXddc2MXxAhZ62+PfWoPcvuW9vr7UVnb2GNioz8MyrGT4FgD4bvGG5Bz NozUkYN8O9noTV+GHpbIHbNbjD3eVXH4K4G3I=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=kcB2jMqP2gnwZtwP+2gMZ138kov6odsFZGOyArClCMIS4WboEjq9MXeXV9Uk92OCYP JOdMs2zulhYAx3eBfJxHFu1QQ37fnFmhcY9fgIPlkiHRZ1NJuUwXB+ostqpwpfilnFg8 kEm1pFBu3/FBGb7nPul0F9ygHBrKWST50wJLo=

This file has my sse float implementation for square root. The SSE
square root instruction has only 12 bits of precision so extra
iterations of Newton Raphson may be neccessary. How many of the are
neccessary, I don't know. the max error I was getting was O(1e-8) in
[0,1]. The cephes implementation has square root only for limited
range. They do some other hacks to take care of range. I'll look into
implementing those later. For now, this should be an acceptable for
the fast implementations of square root atleast.

Regards,

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay

#include<iostream>
#include<cmath>
#include<pmmintrin.h>
using namespace std;

const __m128 half=_mm_set1_ps(0.5);

__m128 ei_psqrt(const __m128 x)
    {
    __m128 x1=_mm_sqrt_ps(x);
//now 3 Newton_Raphson iterations
    __m128 x2=_mm_mul_ps(half, _mm_add_ps(x1,_mm_div_ps(x,x1)));
    __m128 x3=_mm_mul_ps(half, _mm_add_ps(x2,_mm_div_ps(x,x2)));
    return    _mm_mul_ps(half, _mm_add_ps(x3,_mm_div_ps(x,x3)));
    }

const int MAX=100000;
float __attribute__((aligned(16))) in[MAX];
float __attribute__((aligned(16))) out[MAX];

int main()
    {
    int i;
    __m128 temp;
    float err, maxerr;
    maxerr=0;
    for(i=0; i<MAX; i++)
	in[i]=float(i)/MAX;
    for(i=0; i< (MAX>>2); i++)
	{
	temp=ei_psqrt(_mm_load_ps(in+4*i));
	_mm_store_ps(out+4*i,temp);
//verify
	err=(out[i]-sqrtf(in[i]));
	maxerr=max(err,maxerr);
	err=(out[i+1]-sqrtf(in[i+1]));
	maxerr=max(err,maxerr);	
	err=(out[i+2]-sqrtf(in[i+2]));
	maxerr=max(err,maxerr);
	err=(out[i+3]-sqrtf(in[i+3]));
	maxerr=max(err,maxerr);
	}
    cout<<maxerr;
    return 0;
    }

Follow-Ups:
- [eigen] Re: SSE square root
  - From: Rohit Garg
- Re: [eigen] SSE square root
  - From: Cyril Flaig
- Re: [eigen] SSE square root
  - From: Gael Guennebaud

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] sse transcendental functions
Next by Date: [eigen] Re: SSE square root
Previous by thread: Re: [eigen] Stability of memory allocation functions
Next by thread: [eigen] Re: SSE square root

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/