|Re: [eigen] Performance gap between gcc and msvc ?|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Performance gap between gcc and msvc ?
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Fri, 18 Jun 2010 22:51:20 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=eeKbufnlHDUcbW98M9qe3zqRb8wVZVNLAlmKB9o7VxM=; b=hPjzjWrqYsc7xsSnRpAdyFslIUPCQScEoczhVOpP5us+fqLEl5I0OWdbZoO3DbABvZ l9eVQxH4xCM9bbJmvzyTVeEmdVh0X5aV5LHLcLfAsmXdxAtoUmFtx5785oecep0YuGZZ HBROB5R//YIEugMolkZLO4EwDO77ndSXbegso=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=aAq6jukRNmJNlTPiPtCqhwbFtLm8S7yymAr6BUFFyGh6sOkyRr/EnDb8pt1doKdlCI GqlZ50I0Op0HMCrOM6zqWfk4vSj9r4F+gkc2jt4AkVLmIDm208iQ858dpnFwDPs83lHO KmKxL77YI2TXpj5pZerl3gpqbvvZNxHw+efRs=
yes, that be very nice. Does anyone know to query the cache sizes? One
solution would be to write some assembly to query the CPUID and then
manage our own table but well I hope there exist something simpler!
On Fri, Jun 18, 2010 at 9:42 PM, David Roundy
> It does seem like it'd be worth trying to come up with a good
> heuristic. Among other advantages, it'd be very nice to be able to
> create a single binary that can effectively be run on a variety of
> CPUs. Of course, one could probably do all right by just picking the
> smallest cache... but something picked at runtime ought to be able to
> beat that pretty easily!
> (Speaking as someone who runs a pretty heterogeneous cluster of workstations...)
> On Fri, Jun 18, 2010 at 12:30 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> 2010/6/18 David Roundy <roundyd@xxxxxxxxxxxxxxxxxxxxxxx>:
>>> On Fri, Jun 18, 2010 at 12:09 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>>> This smells like code tuned for larger CPU caches than your Core i5 has. Indeed:
>>>> - you are using very large matrices, so it's crucial that blocks fit
>>>> in the cpu caches.
>>>> - Core i5 are mass market cpus with presumably not too big caches.
>>>> Try finding out the size of your caces (e.g. cat /proc/cpuinfo on
>>>> linux) and playing with Eigen's cache size settings (see recent thread
>>> Is this something that could be done automatically at runtime?
>> It's nontrivial, but it's not unthinkable that we could get a sensible
>> default computed automatically. We have solved the big problem of
>> where to store state in a template library, by storing state as static
>> local vars in functions. The next problem is how to find out the cache
>> size on each platform we aim to support. Of course, at the very best,
>> we could get a sensible default, but in many cases only the user can
>> really know what value is right, since the cpu cache is going to be
>> shared with other threads and processes.
> David Roundy