Re: [eigen] Automatic cache and block size determination

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


In data lunedì 21 giugno 2010 23:53:09, Gael Guennebaud ha scritto:
> You can test it by going into eigen/bench and runnnig:
> 
> g++ -DNDEBUG -O2 -lrt bench_gemm.cpp -I .. -o gemm && ./gemm
> 
> At the beginning it reports the detected L1 and L2/L3 cache sizes and
> -1 if cpuid is not supported. Currently I've tested with core2
> processors only, so it would be great if some could try it on other
> architectures and report failures (-1 and/or wrong cache sizes).

Great work :)
I think it works well here, here's the output of gemm + the first cpu (out of 4 identical) in cpuinfo. This all seems coherent.

I guess there is a very small runtime cost of 'checking if the cache sizes have already been computed' for every product, right ?
And also, computation involving cache sizes were previously done at compile time and not anymore.. ? Nothing to worry about ? 


orzel@berlioz hg/eigen/bench% g++ -DNDEBUG -O2 -lrt bench_gemm.cpp -I .. -o gemm && ./gemm
L1 cache size    = 65536 KB
L2/L3 cache size = 512 KB
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 2048
eigen cpu         4.33855s      3.95982 GFLOPS  (8.82266s)
eigen real        4.34305s      3.95571 GFLOPS  (8.84252s)


orzel@berlioz hg/eigen/bench% cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 5
model name      : AMD Athlon(tm) II X4 620 Processor
stepping        : 2
cpu MHz         : 2600.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5                                                                                                                                                         
wp              : yes                                                                                                                                                       
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save                                                                                                                                              
bogomips        : 5223.92
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


-- 
Thomas Capricelli <orzel@xxxxxxxxxxxxxxx>
http://www.freehackers.org/thomas




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/