Re: [eigen] two things

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] two things
From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
Date: Thu, 26 Jun 2008 19:41:13 +0200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=Q/CQ7NEwMhxLaH7jaFpYEnGFRDmgzOiXjwalYmb5ufY=; b=BGt5ufO76w1kYwfzqiFbqn5ATnMvIQb6U8lZ2MGDIO6nbuCCnInC8Bu3by5nJF5WjK sg9uVlFbfVX6EBHgimigwwXVfLgMlxVMgmutnzuLL0ONI8npqy3BvD7HNZAETLkyp7LA FNFAA860UASvSgYSw1HTi/v69HKB8DWdCCn4k=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=tSwno5LM9NAfErCAYRIGdJi3TodwLPzBv8npDlz+ScAOVDCWb2PL+2aP9k/yr6PdaW XlaToVUIRfKGR57a+jrmp+33wp1FEl6i5nYaVpVqhiCPI0nTvQPvCM74PUVfSmI981vu JOb1CJcgnW12S56to5r3j90VNFe+NOFbloTxI=

On Thu, Jun 26, 2008 at 7:26 PM, Benoît Jacob <jacob@xxxxxxxxxxxxxxx> wrote:
> On Thursday 26 June 2008 18:55:22 Gael Guennebaud wrote:
>> yes, exactly. but I'm still puzzled by these results since on a 2GHz
>> core2  we could expect a peak performance of 8 GFlops and we are far
>> far away. I've also tried c = a + b; => even slower. On the other hand
>> with a += a;  I could reach ~ 4.5 GFlops . For comparison purpose, our
>> optimized matrix product on 1024x1024 matrices achieve ~9 GFlops ! yes
>> 9 ! this is because the CPU can does an "add" and a "mul" at the same
>> time... I guess the trick would be to do some prefetching but I did
>> not manage to get any improvements so far...
>
> I was thinking the same;
>
> Here is what the critical loop looks like in assembly:
>
> .L68:
>        movaps  (%edx,%eax,4), %xmm0
>        addps   (%esi,%eax,4), %xmm0
>        movaps  %xmm0, (%edx,%eax,4)
>        addl    $4, %eax
>        cmpl    %eax, %ecx
>        jg      .L68
> So, for one productive instruction (the addps) there are 2 mov instructions
> (and i don't could the 3 last instructions which go away once we peel the
> loop). Could that somehow be improved?

the only way I know to improve that is to do loop peeling and reduce
the dependency between the instructions... basically this is what I
tried to do in benchAddVec... but I only get improvement from the loop
peeling.

> By the way, I tried this benchmark without vectorization, and got 0.4 GFlops
> at 400x400 size (where the cost of not linearizing is negligible) so the
> benefit of vectorization here is somewhere between +25% and +50%.

yes the improvement of the vectorization is very low here... for me it
is ~ 25% :( This is because we are limited by memory accesses.

> By comparison, I made a simple benchmark for sum() of a big float vector
> (really just modifying vdw_new). There, vectorization speeds up by 4x; and
> when it is enabled I get 1.7 GFlop (counting 1G = 10^9) on my 1.66 GHz CPU.
> So, much better. Not the theoretical maximum, but since this benchmark is
> memory intensive, doing only one add per loaded number, I can believe that
> 1.7 GFlop is all what my laptop's memory allows. Perhaps the better flops in
> the matrix product is because (especially with your cache-friendly code) it
> is more computation intensive relatively to the amount of memory accesses.

yes this example is even more favorable than a += a; because there is
no store, only a single load. Actually the core of the matrix product
is quite similar to .sum() that explains why it works much better.


> Here is the performance-critical part of that sum() benchmark:
>
> .L18:
>        addps   (%ebx,%eax,4), %xmm1
>        addl    $4, %eax
>        cmpl    %eax, %edx
>        jg      .L18
>
> Cheers,
>
> Benoit
>
>

References:
- [eigen] two things
  - From: Benoît Jacob
- Re: [eigen] two things
  - From: Benoît Jacob
- Re: [eigen] two things
  - From: Gael Guennebaud
- Re: [eigen] two things
  - From: Benoît Jacob

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] two things
Next by Date: Re: [eigen] sparse matrix wrapper flavour
Previous by thread: Re: [eigen] two things
Next by thread: Re: [eigen] two things

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/