Re: [eigen] Matrix product crashes when compiled with MSVC 2010 in release

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Awesome!  Confirmed on both the test case and the original larger
project.  Just as I got an ugly-ass inline assembly solution to work
:)

Thanks a lot,

   -Ilya

On Fri, Aug 13, 2010 at 1:59 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2010/8/13 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> 2010/8/13 Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>:
>>> Ok, once again. Since Ilya mentioned that CMake does create projects
>>> which have defaults set to "no whole program optimization" I needed to
>>> start digging again because it implies, that our unit tests should
>>> have captured this!
>>
>> So, are you saying that they are not capturing this in this setup?
>> In that case, please add Ilya's test case to our tests.
>>
>> I think I see where the problem is, and it might be on our side.
>>
>> First of all, let's repeat that this code is really meant to use only
>> registers, not memory, so having an access violation there is really
>> strange.
>>
>> Gael's code there is meant to adapt to the number of registers
>> available: these are these if(nr==4) where nr is a compile-time
>> constant set to
>>
>>    nr = NumberOfRegisters/4,
>>
>> in the same file GeneralBlockPanelKernel.h.
>>
>> Now we can see in the assemby that you pasted,
>>
>>       if(nr==4) traits.acc(C3, alphav, R3);
>> 00123465  movapd      xmm6,xmm0
>> 00123469  mulpd       xmm6,xmmword ptr [C3]
>> 0012346E  addpd       xmm6,xmm7
>> 00123472  add         dword ptr [ebp-3Ch],20h
>> 00123476  add         eax,dword ptr [k]
>>
>> that the condition nr==4 evaluated to true.
>>
>> This means that NumberOfRegisters==16.
>>
>> This is wrong on 32bit x86 !!!!! This constant should be set to 8. The
>> value 16 is correct for x86-64.
>>
>> So:
>>  1) can you confirm that NumberOfRegisters==16 in your setup that
>> reproduces the crash?
>>  2) let's find out why this NumberOfRegisters is miscomputed.
>>
>> If this is confirmed to be the cause of the crash, then we can be very
>> thankful that this crash occurred as having a wrong NumberOfRegisters
>> means inefficient code (accessing memory in inner loop).
>
> Fixed.
>
> The faulty code was:
>
> #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
> #if (defined __i386__)
> #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 8
> #else
> #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 16
> #endif
>
> apparently MSVC 2010 (and perhaps other versions of MSVC) doesn't
> define __i386__ so this was miscomputed as 16.
>
> Fixed by using sizeof(void*) as a much more universal way of telling
> whether we have a 32bit or 64bit arch.
>
> It would be useful now to scan Eigen's sources for other places where
> we'd be misusing __i386__. I think I've known for a long time that it
> wasn't universally supported, but this example shows how tricky this
> can be!
>
> Benoit
>
>>
>> Cheers,
>> Benoit
>>
>>
>>
>>
>> Benoit
>>
>>>
>>> So, we see a regression and that is happening since
>>>
>>> changeset:   3260:431547872cf7
>>> user:        Gael Guennebaud <g.gael@xxxxxxx>
>>> date:        Mon Jul 12 16:31:46 2010 +0200
>>> summary:     matrix product: move the alpha factor to gebp instead of
>>> the packing
>>>
>>> Since here, we are hitting a compiler bug and considering that
>>> everybody using CMake will run into it, it is definitely a show
>>> stopper for 3.0.
>>>
>>> I have still no idea for workarounds and just did not myself hit this
>>> bug earlier because I am working with 64bit builds only.
>>>
>>> - Hauke
>>>
>>>
>>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/