Re: [eigen] Quaternion and expression template

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2009/11/30 Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>:
> Great that you found that. Looking at this W4 description
>
> http://msdn.microsoft.com/en-us/library/a98sb923%28VS.80%29.aspx
>
> you will see that exception handling at particular cases can be a cause for
> not inlining. Actually, according to what is written on the link site, there
> are no heuristics involved in ignoring inlining in the presence of
> __forceinline, i.e. we should be able to identify and even probably fix
> those cases. Unfortunately the compiler is not helping us and the logic (the
> non-heuristic part) is not really explained on msdn.com...
>
> Maybe, we could learn from your tests that we should avoid implementing
> destructors ourselves when the compiler can generate them.

I'm a bit afraid of infering such rules from experiments, because for
example, i remember adding a default ctor to SVD in the 2.0 branch
because for some user, MSVC failed to generate a default constructor
automatically. Just saying that it's scary because compilers can have
very erratic behavior.

How about instead saying that all that is a good reason to actually do
xpr templates for Quaternion? After Hauke and Gael merge their
branches, the small performance argument should be gone. Also, Gael's
idea for reusable Xpr classes should apply there too, allowing to do
that without writing any new xpr class.

Benoit

>
> - Hauke
>
> On Mon, Nov 30, 2009 at 4:04 PM, Mathieu Gautier <mathieu.gautier@xxxxxx>
> wrote:
>>
>> Hi,
>>
>> I think I have a beginning of an answer for the bad inlining with VS 2008
>> (and VS 2010 beta2). I have a little class :
>>
>> class Test{
>> public:
>>  double data[2];
>>
>>  inline Test() {data[0] = 0; data[1] = 0;}
>>  inline Test(double x, double y) {data[0] = x; data[1] = y;}
>>
>>  inline Test add42(){
>>    return Test(data[0]+42, data[1]+42);
>>  }
>>
>>  inline ~Test(){}
>>
>>
>>  void print(){cout << data[0] << " : " << data[1] << endl;}
>> };
>>
>>
>> __declspec(noinline) void unWin2()
>> {
>>  Test t;
>>  Test t2 = t.add42();
>>
>>    __asm{
>>    nop
>>    nop
>>    nop
>>  }
>>
>>
>>  t.print();
>>  t2.print();
>>
>>  return;
>> }
>>
>> The generated assembly associated to Test t2 = t.add42() is :
>>
>> 004010A3  lea         eax,[esp+10h]
>> 004010A7  lea         ecx,[esp]
>> 004010AA  call        Test::add42 (401080h)
>>
>>        Test::add42
>> 00401080  fld         qword ptr [ecx]
>> 00401082  fld         qword ptr [__real@4045000000000000 (402138h)]
>> 00401088  fadd        st(1),st
>> 0040108A  fxch        st(1)
>> 0040108C  fstp        qword ptr [eax]
>> 0040108E  fadd        qword ptr [ecx+8]
>> 00401091  fstp        qword ptr [eax+8]
>> 00401094  ret
>>
>> using __forceinline (EIGEN_STRONG_INLINE) does not improve the generated
>> assembly. I have also done this trial with the default constructor and copy
>> assignement and with my own copy constructor and copy assignement operator,
>> there are no differences.
>>
>> This code can be inlined correcty in two ways :
>>
>>   * disabling exception handling (removing /EHsc)
>> or * removing the desctructor in Test (inline ~Test(){};)
>>
>> which give, in both case :
>>
>> 00401083  fld         qword ptr [esp]
>> 00401086  fld         st(0)
>> 00401088  fld         qword ptr [__real@4045000000000000 (402138h)]
>> 0040108E  fadd        st(1),st
>> 00401090  fxch        st(1)
>> 00401092  fstp        qword ptr [esp]
>> 00401095  fld         qword ptr [esp+8]
>> 00401099  fld         st(0)
>> 0040109B  faddp       st(2),st
>> 0040109D  fxch        st(1)
>> 0040109F  fstp        qword ptr [esp+8]
>>
>> I don't understand the logic behind this behavior. The problem is exactly
>> the same for the Quaternion class, if the destructor
>>
>>         inline ~Matrix(){} (line 529)
>>
>>  is removed from Matrix.h all function returning a Quaternion by value are
>> correclty inlined (such as operator*(), conjugate(), etc.)
>>
>> --
>> Mathieu
>>
>>
>>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/