Re: [eigen] a record for Eigen: 250 GFLOPS !! |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] a record for Eigen: 250 GFLOPS !!
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Wed, 23 Jun 2010 18:45:15 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=+9o0JCqpZUe9DJS4kp5xYoK0CA/Oi9UwP70Y/XZaatk=; b=q66ehGmuGAQvEkQHQ+V81F3bvE0kDMYOeC+hhvgX9lgSjTr/Yi3kd7tXfIVMEpfx86 oBB6Qz7r+I+EStThtVHNEJ2KQvT520Ytf1g96vuV1+0ylxaqB2tmFPcEiHM73aJzkQAC JclERlovYr2JGiQyRH2gvxV52fIWNbp59QKPA=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=I2gn5ZyR6VoZSkpK3EVP8g2aWA2ROmB4fbk68GObxFrQtpRx4xrqkQ4EjhRw4OKXv2 As2LEUS5EDAkfeFPtU1Kdf/3mKh8uY8TUdf19RcDI05iwk6H1mUCJJxUV51dNV/pKdg5 8M0K7focX5ic5jM84sN9GqlmA9Cb6JaOPRF18=
On Wed, Jun 23, 2010 at 6:12 PM, Manoj Rajagopalan <rmanoj@xxxxxxxxx> wrote:
>
> Congrats! :-)
>
> Quick question: in GEMM benchmarks, does one FLOP represent a * followed by a
> +, or does it mean just one of these. In the latter case, the FLOPS figure
> would be ~ (2-eps) times that from the former.
"one FLOP" == "one *" or "one +"
there is no "eps", the factor is a pure 2 because we are doing C += A * B;
gael
>
> thanks,
> Manoj
>
>
>
> On Wednesday 23 June 2010 06:04:58 am Gael Guennebaud wrote:
>> Hi,
>>
>> this morning I played with a 48 cores AMD SMP server (8 processors
>> AMD-Opteron-8439-SE, 6 cores each @ 2,8 GHz) and a bi-processor made
>> of Intel X5570 @ 2.93GHz (4 multithreaded cores each => a total of 8
>> cores, 16 threads), and here are the results for a product of 2048^2
>> matrices of floats:
>>
>> ** Intel **
>>
>> 16 threads (multi-threading)
>> eigen real 0.158446s 108.427 GFLOPS (2.22212s)
>> mt speed up x5.55349 => 34.7093%
>>
>> 8 threads
>> eigen real 0.125598s 136.785 GFLOPS (1.2581s)
>> mt speed up x7.0835 => 88.5438%
>>
>> 4 threads
>> eigen real 0.228977s 75.0287 GFLOPS (2.37034s)
>> mt speed up x3.88544 => 97.136%
>>
>> 2 threads
>> eigen real 0.449604s 38.2111 GFLOPS (4.72754s)
>> mt speed up x1.98317 => 99.1583%
>>
>> 1 thread
>> eigen mono cpu 0.891639s 19.2677 GFLOPS (8.9178s)
>>
>>
>> a speed up factor of ~7 for 8 cores is a very nice scaling IMO.
>>
>>
>> ** AMD **
>>
>>
>> 1 thread
>> eigen mono cpu 1.54084s 11.1496 GFLOPS (15.4136s)
>>
>> 2 threads
>> eigen real 0.817967s 21.0031 GFLOPS (8.18607s)
>> mt speed up x1.88375 => 94.1874%
>>
>> 4 threads
>> eigen real 0.41879s 41.0226 GFLOPS (4.1911s)
>> mt speed up x3.73174 => 93.2936%
>>
>> 8 threads
>> eigen real 0.214083s 80.2485 GFLOPS (2.15697s)
>> mt speed up x7.49282 => 93.6602%
>>
>> 16 threads
>> eigen real 0.115521s 148.716 GFLOPS (1.26385s)
>> mt speed up x13.4568 => 84.1048%
>>
>> 24 threads
>> eigen real 0.168208s 102.135 GFLOPS (1.75357s)
>> mt speed up x9.55177 => 39.7991%
>>
>> 32 threads
>> eigen real 0.0686023s 250.427 GFLOPS (1.19708s)
>> mt speed up x23.001 => 71.8781%
>>
>> 42 threads
>> eigen real 0.0799503s 214.882 GFLOPS (0.938163s)
>> mt speed up x19.9015 => 47.3844%
>>
>> 48 threads
>> eigen real 0.143299s 119.888 GFLOPS (1.62653s)
>> mt speed up x11.2097 => 23.3536%
>>
>>
>> We can see that AMD's SSE implementation is half the speed of Intel's
>> one. This architecture seems to be tricky to control because the peak
>> performance is obtained with 32 threads with a speed up factor of x23
>> that is not bad. With more threads the perf significantly drops down.
>> There is also a slow down with 24 threads.
>>
>> that's all folks.
>>
>> gael
>
>
>
>