Re: [eigen] SGEMM benchmark result against ATLAS |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] SGEMM benchmark result against ATLAS
- From: Rohit Garg <rpg.314@xxxxxxxxx>
- Date: Tue, 24 Aug 2010 15:39:23 +0530
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=DavJjbiQUMp+IGffPasrpec0V7Pa1ZvzP36166CYMUQ=; b=IMRQgjH04jwb0CEvbx4r7Lj5IrR+SROiCqOKXiqYV2wVZfoR2Tdtj9fQFHwG2QR3tt 8PKSm5Xv48HfCtVepbBpPOBTqk6WPkeY8TGcAwziMoFRxFFj05VT3lF+JLavImeC1QI0 nnpbwvwPpmsQ79tS6p8E1sfypefZt61meNeL8=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=VL/RZDch1Na8tI285M0InHxfsQiIA+Bxs2UcMV+RhoiPCDZw9XwvH035cTcCQV3Bzy w6YtUGllLFfidl9aYrcPHvYHUCkrh0zG15Xs5kOUgJ8wq1WGAQzHapUkBgWkRDARV238 RpnAwRah9Z8i7DhcPNZ0Y18/d0NMss00mvlvw=
On an i7, the peak flop measurement is not straightforward because of
turbo-mode.
On Tue, Aug 24, 2010 at 9:15 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> Hi,
>
> Hearing from Keir that he saw untuned ATLAS outperform us by a 30% margin,
> which would be very unusual, I ran our benchBlasGemm a bit. By the way, I
> updated it to make it compile, which involved removing the eigen_..._normal
> path which didn't look useful (?), hope it's OK. Also, it was missing a
> extern "C" around the cblas #include.
>
> So I installed the most optimized ATLAS package that I could on Fedora,
> built with SSE3.
>
> I compiled our benchmark with:
>
> cd eigen/bench/
> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp -o benchBlasGemm
> -lrt -lcblas
>
> And ran it on some 4096x4096 matrices:
>
> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
> 4096 x 4096 x 4096
> cblas: 8.73982 (7.862 GFlops/s)
> eigen : 8.9491 (7.678 GFlops/s)
> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
> 4096 x 4096 x 4096
> cblas: 8.51913 (8.066 GFlops/s)
> eigen : 8.42922 (8.152 GFlops/s)
>
> So _my_ results show Eigen3 and ATLAS running at the same speed roughly,
> albeit with a great variability.
>
> This is still perplexing for 2 reasons:
> - we used to beat ATLAS by a wide margin.
> - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at 1..66
> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps) we
> should aim at 13.33 GFlops. So we are running here at only 60% of the
> theoretical maximum; I think we used to do much better than that.
>
> So let me ask Gael and Keir:
> * Keir: what do you get on this benchmark? How did you get this result where
> ATLAS outperformed us by 30%?
> * Gael: suppose I want to get deeper into this, where do I start?
>
> Cheers,
> Benoit
>
--
Rohit Garg
http://rpg-314.blogspot.com/