Re: [eigen] geometry module

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Wed, Sep 8, 2010 at 1:18 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2010/9/8 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> On Tue, Sep 7, 2010 at 12:57 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>> Here, "most efficiently" depends on what you're doing. If you want to
>>> apply this transformation to a vector, it's going to be faster if you
>>> have a matrix representation of your transform, as the Transform class
>>> does. This is one of the most performance-critical use cases...
>>
>> some numbers to transform N 3D vectors stored into a 3xN column major
>> matrix and transformed using a 3x3 matrix, a quaternion using the
>> quaternion x a single vector product, and a quaternion converted on
>> the fly to a 3x3 matrix. The times are in second for 100000 runs (in
>> the last case the quaternion is converted 100000 times to a matrix).
>>
>> N                  1         2         3         4         5         6
>>        7         8
>> matrix 3x3 0.0007521 0.0008807  0.001357  0.002339  0.002869  0.003583
>>  0.004301   0.02684
>> quaternion  0.001332  0.002183  0.003098  0.004002  0.004913  0.005945
>>  0.007081  0.007997
>> quat-mat    0.001165   0.00152  0.001822  0.002925  0.003396  0.003964
>>  0.004615   0.02727
>>
>> as expected the matrix product is significantly faster, but what is
>> surprising is that even for transforming a single vector (N=1), it is
>> faster to convert the quaternion to a matrix and then perform the
>> matrix product rather than directly using the optimized
>> quaternion-vector product since the costs are respectively:
>>
>> 3x3 matrix : 9 mul + 6 add = 15 ops
>> quaternion : 15 mul + 15 add = 30 ops
>> quat-mat   : 18 mul + 21 add = 39 ops
>>
>> These numbers directly come from the assembly where we can see gcc
>> optimized the "2 * v" by "v+v".
>>
>> also Daniel you might be interested to know that this benchmark is in
>> bench/quaternion.cpp (in trunk).
>
> Thanks a lot for these numbers!
>
> Do you think that quaternion*vector3D has room to be improved by
> copying the vector3d into a vector4d and applying the vectorizable
> quaternion*vector4D product? I am worried about the 4th component: if
> it would be required to divide by it, that could kill the benefit.

It is even worse. I've simply tried to copy the input into a vector4f
and used the vectorized cross3 function. The result is 5 pmul and 5
padd only:

N                  1
matrix 3x3 0.0007607
quaternion  0.002226
quat-mat    0.001178

The problem is not the copy which are well optimized away by gcc, but
the extra 5 shuffling. Maybe some shuffling can be removed by directly
vectorizing the quaternion * vector product (we currently vectorize
quat*quat only).

gael


> Benoit
>
>
>>
>>
>> gael
>>
>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/