Re: [eigen] Patch for quaternion normalization and cross product for Vector4f

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


hi,

for your information normalize was already automagically vectorized
because it basically compiled to something like:

q.coeffs() /= ei_sqrt(q.coeffs().cwise().abs2().sum());

and all these operators are vectorized by Eigen, regardless of the
scalar type (float, double, int) and size (must be Dynamic or a
multiple of the vector-size). With -msse3 and -ffast-math it compiles
to:

	movaps	(%rdi), %xmm1
	movaps	%xmm1, %xmm0
	mulps	%xmm1, %xmm0
	haddps	%xmm0, %xmm0
	haddps	%xmm0, %xmm0
	sqrtss	%xmm0, %xmm2
	movss	.LC1(%rip), %xmm0
	divss	%xmm2, %xmm0
	shufps	$0, %xmm0, %xmm0
	mulps	%xmm1, %xmm0
	movaps	%xmm0, (%rdi)

and with only -msse2:

	movaps	(%rdi), %xmm1
	movaps	%xmm1, %xmm2
	mulps	%xmm1, %xmm2
	movaps	%xmm2, %xmm0
	movhlps	%xmm2, %xmm0
	addps	%xmm2, %xmm0
	movaps	%xmm0, %xmm2
	shufps	$1, %xmm0, %xmm2
	addss	%xmm2, %xmm0
	movaps	%xmm0, %xmm2
	movss	.LC1(%rip), %xmm0
	sqrtss	%xmm2, %xmm2
	divss	%xmm2, %xmm0
	shufps	$0, %xmm0, %xmm0
	mulps	%xmm1, %xmm0
	movaps	%xmm0, (%rdi)


About the cross product, the perf boost is pretty low (x1.25), so I
don't know if that's really worth it. It is much better to pack your
data such that 4 cross products can be performed once.

Beyond that, yes "svn diff" is the right way to generate a patch.

cheers,
Gael.

On Tue, Mar 10, 2009 at 6:18 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> Hi,
>
> Please find attached a ptch which implements quaternion normalization
> and cross products for Vector4f. Now obviously, it isn't defined for
> 4d vectors, but this one crosses it assuming it as a 3 component
> vector. The w component can be anything but will be set to zero after
> the product. As the patch mentions, It is mainly meant for those ppl
> who use vector4f instead of vector3f to use vectorization. I think
> this can be helpful to those as earlier cross product has to be done
> using vec3f.
>
> Quaternion normalization is obviously useful. and it is now vectorized.
>
> The patches are somewhat incomplete in the sense that I am not
> familiar with the C++ and the intrinsics bridge so much. I am just
> beginning to understand the guts of eigen and templates are even more
> confusing for me. You may have to write (small hopefully) pieces of
> glue code. On the intrinsics side, it should be complete, however. So,
> please indulge me for a while. I promise to send more complete patches
> in the future.
>
> Here's how it was generated.
>
> ~/eigen2@rpg-lab> svn update
> At revision 937616.
> ~/eigen2@rpg-lab> svn diff > rpg_patch
> ~/eigen2@rpg-lab> ls
> bench  cmake  CMakeLists.txt  COPYING  COPYING.LESSER
> CTestConfig.cmake  demos  disabled  doc  Doxyfile  Eigen  Mainpage.dox
>  rpg_patch  test  unsupported
> ~/eigen2@rpg-lab>
>
> I hope that is the correct way to generate patches. If it isn't please
> tell me what is.
>
> Regards,
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/