Re: [eigen] Quaternion and expression template

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Great that you found that. Looking at this W4 description

http://msdn.microsoft.com/en-us/library/a98sb923%28VS.80%29.aspx

you will see that exception handling at particular cases can be a cause for not inlining. Actually, according to what is written on the link site, there are no heuristics involved in ignoring inlining in the presence of __forceinline, i.e. we should be able to identify and even probably fix those cases. Unfortunately the compiler is not helping us and the logic (the non-heuristic part) is not really explained on msdn.com...

Maybe, we could learn from your tests that we should avoid implementing destructors ourselves when the compiler can generate them.

- Hauke

On Mon, Nov 30, 2009 at 4:04 PM, Mathieu Gautier <mathieu.gautier@xxxxxx> wrote:
Hi,

I think I have a beginning of an answer for the bad inlining with VS 2008 (and VS 2010 beta2). I have a little class :

class Test{
public:
 double data[2];

 inline Test() {data[0] = 0; data[1] = 0;}
 inline Test(double x, double y) {data[0] = x; data[1] = y;}

 inline Test add42(){
   return Test(data[0]+42, data[1]+42);
 }

 inline ~Test(){}


 void print(){cout << data[0] << " : " << data[1] << endl;}
};


__declspec(noinline) void unWin2()
{
 Test t;
 Test t2 = t.add42();

   __asm{
   nop
   nop
   nop
 }


 t.print();
 t2.print();

 return;
}

The generated assembly associated to Test t2 = t.add42() is :

004010A3  lea         eax,[esp+10h]
004010A7  lea         ecx,[esp]
004010AA  call        Test::add42 (401080h)

       Test::add42
00401080  fld         qword ptr [ecx]
00401082  fld         qword ptr [__real@4045000000000000 (402138h)]
00401088  fadd        st(1),st
0040108A  fxch        st(1)
0040108C  fstp        qword ptr [eax]
0040108E  fadd        qword ptr [ecx+8]
00401091  fstp        qword ptr [eax+8]
00401094  ret

using __forceinline (EIGEN_STRONG_INLINE) does not improve the generated assembly. I have also done this trial with the default constructor and copy assignement and with my own copy constructor and copy assignement operator, there are no differences.

This code can be inlined correcty in two ways :

  * disabling exception handling (removing /EHsc)
or * removing the desctructor in Test (inline ~Test(){};)

which give, in both case :

00401083  fld         qword ptr [esp]
00401086  fld         st(0)
00401088  fld         qword ptr [__real@4045000000000000 (402138h)]
0040108E  fadd        st(1),st
00401090  fxch        st(1)
00401092  fstp        qword ptr [esp]
00401095  fld         qword ptr [esp+8]
00401099  fld         st(0)
0040109B  faddp       st(2),st
0040109D  fxch        st(1)
0040109F  fstp        qword ptr [esp+8]

I don't understand the logic behind this behavior. The problem is exactly the same for the Quaternion class, if the destructor

        inline ~Matrix(){} (line 529)

 is removed from Matrix.h all function returning a Quaternion by value are correclty inlined (such as operator*(), conjugate(), etc.)

--
Mathieu






Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/