Re: [eigen] Performance gap between gcc and msvc ?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


ok, looks what we need is just the CPUID instruction.

http://en.wikipedia.org/wiki/CPUID#EAX.3D2:_Cache_and_TLB_Descriptor_information

apparently this is what linux uses for /proc/cpuinfo.

Benoit

2010/6/18 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> bah, cpuinfo has very little information. Here it just says "cache
> size 6144 KB" but it says that for every of my cores, without saying
> if that cache is shared among cores.
>
> hopefully there's something better under /sys ?
>
> Benoit
>
> 2010/6/18 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> on linux, actually, in the worst case, we can always
>> fopen("/proc/cpuinfo"). Will be slow, but any system call is slow, and
>> it's something we do once only.
>>
>> Benoit
>>
>> 2010/6/18 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>> yes, that be very nice. Does anyone know to query the cache sizes? One
>>> solution would be to write some assembly to query the CPUID and then
>>> manage our own table but well I hope there exist something simpler!
>>>
>>> gael
>>>
>>> On Fri, Jun 18, 2010 at 9:42 PM, David Roundy
>>> <roundyd@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>> It does seem like it'd be worth trying to come up with a good
>>>> heuristic.  Among other advantages, it'd be very nice to be able to
>>>> create a single binary that can effectively be run on a variety of
>>>> CPUs.  Of course, one could probably do all right by just picking the
>>>> smallest cache... but something picked at runtime ought to be able to
>>>> beat that pretty easily!
>>>>
>>>> (Speaking as someone who runs a pretty heterogeneous cluster of workstations...)
>>>>
>>>> David
>>>>
>>>> On Fri, Jun 18, 2010 at 12:30 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>>>> 2010/6/18 David Roundy <roundyd@xxxxxxxxxxxxxxxxxxxxxxx>:
>>>>>> On Fri, Jun 18, 2010 at 12:09 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxxx> wrote:
>>>>>>> This smells like code tuned for larger CPU caches than your Core i5 has. Indeed:
>>>>>>>  - you are using very large matrices, so it's crucial that blocks fit
>>>>>>> in the cpu caches.
>>>>>>>  - Core i5 are mass market cpus with presumably not too big caches.
>>>>>>>
>>>>>>> Try finding out the size of your caces (e.g. cat /proc/cpuinfo on
>>>>>>> linux) and playing with Eigen's cache size settings (see recent thread
>>>>>>> here).
>>>>>>
>>>>>> Is this something that could be done automatically at runtime?
>>>>>
>>>>> It's nontrivial, but it's not unthinkable that we could get a sensible
>>>>> default computed automatically. We have solved the big problem of
>>>>> where to store state in a template library, by storing state as static
>>>>> local vars in functions. The next problem is how to find out the cache
>>>>> size on each platform we aim to support. Of course, at the very best,
>>>>> we could get a sensible default, but in many cases only the user can
>>>>> really know what value is right, since the cpu cache is going to be
>>>>> shared with other threads and processes.
>>>>>
>>>>> Benoit
>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Roundy
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/