Re: [eigen] runtime API to configure the cache/blocking parameters

[ Thread Index | Date Index | More Archives ]

On Mon, Jun 7, 2010 at 6:23 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> wrote:
Hi Gael,

This looks like a good start.  If I ever get around to adding PPC-specific intrinsics, I would probably want to keep track of a few other things like the number of registers available and the size of the L3.  

yes registers are specified at compile time by the macro EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS which is currently automatically set for PPC, x87, SSE 32 bits, and SSE 64 bits.

One can also specialize the struct ei_product_blocking_traits for each scalar type to adjust the automatic register level blocking parameters, though currently there is no much flexibility left! Those parameters really have to be set at compile time because they also control partial loop unrolling.

For L2/L3, yes we could add one more level of blocking along the "n" direction of the right hand side matrix (see my reply to Benoit) to handle the cases where max_k x n is very large compared to L2 cache size though so far I did not see the need for it.




On Mon, Jun 7, 2010 at 5:40 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

I've committed a change allowing to configure the cache size and the related blocking size parameters at runtime. It use the trick that consists in storing runtime variables as static members of a global function. This way no need to have a shared binary lib just for that.

Here is the related "public" documented API:

/** \returns the currently set cpu cache size (in bytes) used to estimate the ideal blocking size parameters */
std::ptrdiff_t ei_cpuCacheSize();

/** Set the cpu cache size (in bytes) for blocking.
  * This function also automatically set the blocking size parameters for each scalar type using the following formula:
  * \code
  *  max_k = 4 * sqrt(cache_size/(64*sizeof(Scalar)));
  *  max_m = 2 * k;
  * \endcode
  * overwriting custom values set using the ei_setBlockingSizes function.
  * \sa ei_setBlockingSizes */
void ei_setCpuCacheSize(std::ptrdiff_t cache_size);

/** Set the blocking size parameters \a maxK and \a maxM for the scalar type \a Scalar.
  * Note that in practice there is no distinction between scalar types of same size.
  * \sa ei_setCpuCacheSize */
template<typename Scalar>
void ei_setBlockingSizes(std::ptrdiff_t maxK, std::ptrdiff_t maxM);

/** \returns in \a makK, \a maxM the blocking size parameters for the scalar type \a Scalar.
  * \sa ei_setBlockingSizes */
template<typename Scalar>
void ei_getBlockingSizes(std::ptrdiff_t& maxK, std::ptrdiff_t& maxM);

You can also see the complete diff there: to see how it is implemented in practice.

This API is just a proposal, if someone has better ideas...


Mail converted by MHonArc 2.6.19+