I'm currently using the Eigen::Tensor module on a relatively small processors which has very limited cache, 16KB level 1 no level 2 at all! I've been looking for any way to optimise the blocking of operations performed by Eigen for a particular block size but I can't find anything so far.
Is there a way to optimise the Tensor operations for this type of small cache?