Re: [eigen] a record for Eigen: 250 GFLOPS !!

[ Thread Index | Date Index | More Archives ]

>>>>> "GG" == Gael Guennebaud <gael.guennebaud@xxxxxxxxx> writes:

GG> Hi, this morning I played with a 48 cores AMD SMP server (8
GG> processors AMD-Opteron-8439-SE, 6 cores each @ 2,8 GHz) and a
GG> bi-processor made of Intel X5570 @ 2.93GHz (4 multithreaded cores
GG> each => a total of 8 cores, 16 threads), and here are the results
GG> for a product of 2048^2 matrices of floats:

GG> We can see that AMD's SSE implementation is half the speed of Intel's one.

I beleive the X5570 (Nehalem/Gainestown) is new enough to have 128 bit
fp adders and multipliers, whereas the 8439 (Istanbul) probably still
has 64 bit adders and mupltipliers.  That would explain the performance
difference you noticed with SSE floats.  Or, perhaps, the Istanbul may
need to serialize the ops not just with doubles but even with floats?

I'm not sure whether AMD's Magny Cours processors have the 128 bit fp
units, but I've read that the upcoming Bulldozer arch definitely will.

James Cloos <cloos@xxxxxxxxxxx>         OpenPGP: 1024D/ED7DAEA6

Mail converted by MHonArc 2.6.19+