On Thu, Oct 29, 2009 at 2:17 PM, Mathieu Gautier
<mathieu.gautier@xxxxxx> wrote:
So to conclude, don't worry, I'm pretty sure that "q = q1.conjugate() * q2" is already as fast as a specialized "quat_mult_conj(q, q1, q2)".
Yes and no :) I am working on visual studio 2008 and the assembly code generated with eigen is not optimized. There are 3 temporary quaternion, and a call to each constructor, to conjugate() and to operator*(). When I modify the Quaternion constructor from
inline Quaternion(Scalar w, Scalar x, Scalar y, Scalar z)
{ coeffs() << x, y, z, w; }
to
inline Quaternion(Scalar w, Scalar x, Scalar y, Scalar z) :
m_coeffs(x,y,z,w){}
the constructor is correctly inlined in the assembly code, but the calls to conjudate() and operator*() remain.
Then, I just try q = q1.conjugate()*q2 with eigen and gcc and the generated assembly code is quite similar to the specialized function (that was your remark). So, the visual compiler seems to have some difficulties to optimize this code.
Third, I use a little Quaternion class which is not templated and perform the same operation ("q = q1.conjugate()*q2") with visual studio. The assembly code is then well optimized (no call, no temporary constructor). So, I think that the problem come from the template and the inlining. Moreover using EIGEN_STRONG_INLINE (__forceinline) instead of inline and a bunch of optimization flags does not help. Have you ever encounter this issue with eigen and visual?
ahh VS.... I wanted to suggest you to try with EIGEN_STRONG_INLINE because I remember that did help VS to better inline, but if that does not wok here then I'm clueless. Moreover I barely don't know VS since I used it only once to try it with Eigen before VS users came to the rescue. So perhaps some VS gurus have some idea ?
gael
--
Mathieu Gautier