Re: [eigen] Automatic cache and block size determination |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Automatic cache and block size determination
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Tue, 22 Jun 2010 11:26:41 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=+B3nKpOD8WQ/2FQGIGG8M/EuNbHKdK5Hh3OfY4A5M6Y=; b=hTH4pNFqqhQgLLXKozE/SCzhqasNPygx5YH8dvFivxEqnVcUxGvJy9OxdaeINqmK0j rcjiXPHOO0V1D04eGxsV/wH81/vURS7NBvMwodYYI0Q9NFzoj+9LiqkrLWw8i13mAd5H 2EQLJ2wMjZTsgDdPbFyM3bme2E2ycbwRzauco=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Bzh0XiVcHp6Dk3orreWZeYjIKQCNiLFaIs4yMFqYFDUQrGeNzOZiVYQR9NSYdotlPO MhTLgSRe9sdqboZifWD1Ktaky7E4Znq0SoAuNkn0TDBSeFBj8P2CW9zMPlJTLAEHe5dB cfAa4qpHOwguewO1n+IG5Da4kizNV78YfaZnM=
Well, let me give you some numbers.
The time to query the L1 and L2 cache sizes at runtime is 0.5ms. This
is done only once per execution of your software (in case you perform
matrix products on dynamic size matrices).
The overhead to test these queries have already been done is only one
"if(something==0)". This is completely negligible compared to all the
other computations which have to be carried out before doing the
actual matrix product. They include the computation of the block sizes
(which depend of the sizes of the matrices):
std::ptrdiff_t l1, l2;
ei_manage_caching_sizes(GetAction, &l1, &l2); // cost = 1 "if" (cheap)
k = std::min<std::ptrdiff_t>(k, l1/kdiv); // kdiv is a power of 2 => 1
bit shift (cheap)
std::ptrdiff_t _m = l2/(4 * sizeof(LhsScalar) * k); // this integer
division cannot be avoid even if L1 and L2 are known at compile time
if(_m<m) m = _m & mr_mask;
then we have several allocations of the blocks:
Scalar* blockA = ei_aligned_stack_new(Scalar, kc*mc);
std::size_t sizeB = kc*Blocking::PacketSize*Blocking::nr + kc*cols;
Scalar* allocatedBlockB = ei_aligned_stack_new(Scalar, sizeB);
Scalar* blockB = allocatedBlockB + kc*Blocking::PacketSize*Blocking::nr;
then the data are copied into these blocks,
etc.
So really, I think this little "if" is totally negligible.
What could be really useful, however, is a way to instantiate a
"matrix product object" with some information on the maximal and/or
typical matrix sizes we are considering such that all the above
initialization cost can be avoided when doing many matrix products on
matrices having the same sizes. For instance, this could be useful for
blocked decompositions.
gael
On Tue, Jun 22, 2010 at 9:39 AM, <bernard.hugueney@xxxxxxxxxx> wrote:
>
> Hi,
>
>
>
> On Tue, 22 Jun 2010 00:46:22 +0200, Thomas Capricelli
>
>
>
>> I guess there is a very small runtime cost of 'checking if the cache
>
> sizes
>
>> have already been computed' for every product, right ?
>
>> And also, computation involving cache sizes were previously done at
>
>> compile time and not anymore.. ? Nothing to worry about ?
>
>
>
> If possible, it would be best to have a #define L2_CACHE_SIZE that would
>
> default to
>
> a runtime query at static initialization time but could be set when
>
> compiling.
>
>
>
> When set at compile time, a typed value (as in [0]) would enable meta
>
> programming unrolling,
>
> when set at runtime doing it at static initialization time would avoid
>
> polluting other code
>
> with a check run only once.
>
>
>
> My .2€
>
>
>
> Best regards,
>
>
>
> Bernard
>
>
>
> [0] http://www.boost.org/doc/libs/1_43_0/libs/mpl/doc/refmanual/int.html
>
>
>