Re: [eigen] SSE questions |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] SSE questions*From*: Radu Bogdan Rusu <rusu@xxxxxxxxxxxxxxxx>*Date*: Tue, 02 Feb 2010 11:10:58 -0800*Organization*: Willow Garage

Benoit, Thanks for the fast reply. Benoit Jacob wrote:

2010/2/1 Radu Bogdan Rusu <rusu@xxxxxxxxxxxxxxxx>:Hi all, I have a few questions regarding the use of SSE instructions in the Eigen 2.x branch (2.0.11 to be more exact). I've looked at the generated assembly for some of them, but I just want to double check this with the Eigen developers. 1) Why isn't a Vector4f constructor converted into an _mm_set_ps on an SSE platform? Looking through Core/arch/SSE, I did not find any reference to _mm_set_ps.Good question. For now, the Vector4f constructor taking 4 coordinates indeed copies them without SSE. Indeed, _mm_set_ps is what we need here. I understand that it could give a real improvement when the Vector4f thus constructed is used right away in an expression. Patches welcome :)

in SSE/PacketMath.h?

I will try to see if that works later today.

2) Is there any interest in having a specialized 3x3 covariance matrix estimation method for the SSE case?At this stage I wouldn't do such heavy changes in 2.0, but we can discuss this for the development branch. I'm not sure how you would work around the alignment issues at runtime. By copying the matrix into a temporary 4x4 matrix?

What I meant was something like: Eigen::Matrix3f covariance_matrix = Eigen::Matrix3f::Zero (); for loop goes here.... { m128Wrapper point16 = ...; // Prepare the shufflers xxxy = point16.shuffle<0, 0, 0, 1> (); yyzz = point16.shuffle<1, 1, 2, 2> (); xyzx = point16.shuffle<0, 1, 2, 0> (); yzxy = point16.shuffle<1, 2, 0, 1> (); // Multiply 4 + 4 m128Wrapper mat_ptr1 = xxxy * xyzx; m128Wrapper mat_ptr2 = yyzz * yzxy; *(__m128*)&covariance_matrix (0, 0) += mat_ptr1.value; *(__m128*)&covariance_matrix (1, 1) += mat_ptr2.value; covariance_matrix (2, 2) += point16[2] * point16[2]; }

4) Is this the recommended optimized way to get a dot product between a VectorXf and a Vector4f ? float d = ((Eigen::Vector4f)my_vectorxf).start<4>().dot (my_vector4f);

[...]

my_vectorxf.start<4>().dot(my_vector4f)

It seems like it's also working without the <4> if my_vectorxf was set to a Vector4f a priori... Eigen::VectorXf my_vectorxf; my_vectorxf = Eigen::Vector4f (x, y, z, a); float d = my_vectorxf.dot (my_vector4f); This is guaranteed to get optimized, right? Which brings me to the next point :) 5) Can we add in dot product optimization too for SSE4 (_mm_dp_ps) ? http://www.intel.com/technology/itj/2008/v12i3/3-paper/6-examples.htm

Cheers, Radu. -- | Radu Bogdan Rusu | http://rbrusu.com/

**Follow-Ups**:**Re: [eigen] SSE questions***From:*Gael Guennebaud

**References**:**[eigen] SSE questions***From:*Radu Bogdan Rusu

**Re: [eigen] SSE questions***From:*Benoit Jacob

**Messages sorted by:**[ date | thread ]- Prev by Date:
**[eigen] Happy birthday Eigen 2.0 ! + 2.0.12 planning** - Next by Date:
**Re: [eigen] SSE questions** - Previous by thread:
**Re: [eigen] SSE questions** - Next by thread:
**Re: [eigen] SSE questions**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |