Re: [eigen] Automatic cache and block size determination

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Well, let me give you some numbers.

The time to query the L1 and L2 cache sizes at runtime is 0.5ms. This
is done only once per execution of your software (in case you perform
matrix products on dynamic size matrices).

The overhead to test these queries have already been done is only one
"if(something==0)". This is completely negligible compared to all the
other computations which have to be carried out before doing the
actual matrix product. They include the computation of the block sizes
(which depend of the sizes of the matrices):

std::ptrdiff_t l1, l2;
ei_manage_caching_sizes(GetAction, &l1, &l2); // cost = 1 "if" (cheap)
k = std::min<std::ptrdiff_t>(k, l1/kdiv); // kdiv is a power of 2 => 1
bit shift (cheap)
std::ptrdiff_t _m = l2/(4 * sizeof(LhsScalar) * k); // this integer
division cannot be avoid even if L1 and L2 are known at compile time
if(_m<m) m = _m & mr_mask;

then we have several allocations of the blocks:

Scalar* blockA = ei_aligned_stack_new(Scalar, kc*mc);
std::size_t sizeB = kc*Blocking::PacketSize*Blocking::nr + kc*cols;
Scalar* allocatedBlockB = ei_aligned_stack_new(Scalar, sizeB);
Scalar* blockB = allocatedBlockB + kc*Blocking::PacketSize*Blocking::nr;

then the data are copied into these blocks,

etc.

So really, I think this little "if" is totally negligible.

What could be really useful, however, is a way to instantiate a
"matrix product object" with some information on the maximal and/or
typical matrix sizes we are considering such that all the above
initialization cost can be avoided when doing many matrix products on
matrices having the same sizes. For instance, this could be useful for
blocked decompositions.

gael


On Tue, Jun 22, 2010 at 9:39 AM,  <bernard.hugueney@xxxxxxxxxx> wrote:
>
> Hi,
>
>
>
> On Tue, 22 Jun 2010 00:46:22 +0200, Thomas Capricelli
>
>
>
>> I guess there is a very small runtime cost of 'checking if the cache
>
> sizes
>
>> have already been computed' for every product, right ?
>
>> And also, computation involving cache sizes were previously done at
>
>> compile time and not anymore.. ? Nothing to worry about ?
>
>
>
> If possible, it would be best to have a #define L2_CACHE_SIZE that would
>
> default to
>
> a runtime query at static initialization time but could be set when
>
> compiling.
>
>
>
> When set at compile time, a typed value (as in [0]) would enable meta
>
> programming unrolling,
>
> when set at runtime doing it at static initialization time would avoid
>
> polluting other code
>
>  with a check run only once.
>
>
>
> My .2€
>
>
>
> Best regards,
>
>
>
> Bernard
>
>
>
> [0] http://www.boost.org/doc/libs/1_43_0/libs/mpl/doc/refmanual/int.html
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/