|Re: [eigen] Performance question|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Performance question
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Mon, 23 Feb 2009 23:45:19 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=FDXIMZxE8AQFN87OiMyfD6eHYuo05SfCapt3ievfbr8=; b=iz+5Ugx4f0SSAJNSnL7aXNBK/3c9V2nn+D6UQJ2uxdpN6x2mlwPgyb7WME0rJCHTks OguIwXAJ/MQOXbjgfNoQYJQAFbpfm/w4jgSHRkKo2wxTezCgAL8Qd+0U8frTqlGr2iIb J01ATmfm1+VGzFyMqEn7G/D5PgrUBeYXVrnMo=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=b1pfN5tjo2Xi/XZEzBjbmwrnbvbbM3hLSllTsfncsjPh5YXOHCGOEE67VToxqdN9bX C9gEFlLgETrQNV09XA+54/a8OWHrTDXeisjpJFHNgMDX/rMo71717iVZUGuU8/6Tzs4n sINl30QoyM62bllBP/HooB4vCF6X7jAV8p0iE=
2009/2/23 Yves Bailly <yves.bailly@xxxxxxxxxxx>:
>> Second, you are seeing a 50% speed difference between floats and
>> doubles, even without vectorization. That's very unusual.
> To be honest, on the contrary, it's a behaviour I noticed quite
> often in "real world" applications involving huge sets of datas,
> moreover non-well structured datas.
Well that probably means that these "real world" apps were also
memory-bound (see below)
>> the speed difference is much smaller or inexistent, on both 32bit and
>> 64bit systems. So it sounds like your app is memory-bound, so any
>> vectorization won't help.
> Can you explain what you mean by "memory-bound"? or give me a reference,
> if you don't want to give a (probably long) lesson.
By "memory-bound" I meant that the CPU is waiting for the RAM, so that
any optimization in CPU usage (like vectorization) won't improve
performance since anyway the code is waiting for the RAM.
Sometimes this is inherent to a situation, and there's nothing to do
about it (just buy faster RAM...) but sometimes you can do something
to reduce the amount of memory accesses.
> Ok... I'll check on this then. I have precisely zero experience in
> this, any good reference I could read to learn on this matter?
I also wouldn't say I have experience with that, it's just a general
fact that each time you access the RAM you wait for a few cycles, the
RAM is much slower than the CPU... Another useful fact is that CPUs
tend to have some "cache memory" in which they prefetch data from RAM
when they can foresee that that data is going to be accessed. When
that process is successful, the data will be much faster to access.
However the cache size is typically of the order of magnitude of 1
megabyte. So in order for this to work well, your code must avoid
working on bigger blocks of memory at once.
Sorry I can't give a reference, I'm really no expert (Gael is :) ).
A useful too is "cachegrind".
valgring --tool=cachegrind ./your_program
and then open the resulting file cachegrind.$PID in "kcachegrind".