|Re: [eigen] Performance gap between gcc and msvc ?|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Performance gap between gcc and msvc ?
- From: David Roundy <roundyd@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 18 Jun 2010 12:42:34 -0700
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=mG3kTgQ8CNTE9JSBeoCWNytv7hk2m2KTyAx6Sgfv9oQ=; b=CLEm2GE0pCQfC9AhvOfK5oizdpMpK7vKV0K87c/1/HOXh4GHL8/b02FPFGogo6t849 C8idkxP+9lqwOLOY8rsaKFoI1TDIdlj++vmWlceSyBeRVEHpPUv/VGLsqSv61h8tBqmw HlRuBTXdRyfqYTdEV5IsUkPBt9jiOphr3h3ME=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=kq1WQt8XAd3l/Nr7fk1bayHCH66a3Vsa/rE2SF1sH3eoAl7mlLvpR2DANcJPEDozsX 6RjgeJGsg5sBH9qqL1SpLVMlvS4q3GkpXtc08UtWhoVQrRjDQZUBuc8saly1FlOza2a1 +0JSRM6GYXom6y73itmFYeXARtZduEsXGSr6Y=
It does seem like it'd be worth trying to come up with a good
heuristic. Among other advantages, it'd be very nice to be able to
create a single binary that can effectively be run on a variety of
CPUs. Of course, one could probably do all right by just picking the
smallest cache... but something picked at runtime ought to be able to
beat that pretty easily!
(Speaking as someone who runs a pretty heterogeneous cluster of workstations...)
On Fri, Jun 18, 2010 at 12:30 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2010/6/18 David Roundy <roundyd@xxxxxxxxxxxxxxxxxxxxxxx>:
>> On Fri, Jun 18, 2010 at 12:09 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>> This smells like code tuned for larger CPU caches than your Core i5 has.. Indeed:
>>> - you are using very large matrices, so it's crucial that blocks fit
>>> in the cpu caches.
>>> - Core i5 are mass market cpus with presumably not too big caches..
>>> Try finding out the size of your caces (e.g. cat /proc/cpuinfo on
>>> linux) and playing with Eigen's cache size settings (see recent thread
>> Is this something that could be done automatically at runtime?
> It's nontrivial, but it's not unthinkable that we could get a sensible
> default computed automatically. We have solved the big problem of
> where to store state in a template library, by storing state as static
> local vars in functions. The next problem is how to find out the cache
> size on each platform we aim to support. Of course, at the very best,
> we could get a sensible default, but in many cases only the user can
> really know what value is right, since the cpu cache is going to be
> shared with other threads and processes.