Re: [eigen] Help needed to run a benchmark on many machines

[ Thread Index | Date Index | More Archives ]

2015-02-18 16:04 GMT-05:00 Ilja Honkonen <ilja.j.honkonen@xxxxxxxxx>:
Oh - I guess you mean in case a matrix got allocated at the same address
as a previous one, and not initialized. But the benchmark initializes
all the matrix coefficients anyway, so IIUC it should be pretty
deterministic in this respect.

For example this code:
    double starttime = time();
    for (int i = 0; i < iters_at_a_time; i++) {
      c = a * b;
    double endtime = time();
is probably not the most representative one, or maybe it depends a lot on what a function does. I think for one time calculations the above loop won't tell much since a, b and c are already in cache for all but the first iteration so memory transfers won't show up. Maybe this is why in some cases one doesn't see much difference between different block sizes.

Ah, good point. I guess I had in mind a situation where whole matrices a,b,c would each be bigger than cache. For example, 1kx1k float matrices (4M) on a typical ARM CPU (1M cache). But indeed, my benchmark also aims to measure much smaller cases, so that's a problem. Any suggestions welcome on how to address this!



Mail converted by MHonArc 2.6.19+