|Re: [eigen] Parallel matrix multiplication causes heap allocation|
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
On Dec 18, 2016, at 2:46 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
You mean N threads calling 1 GEMM, N threads calling N GEMM or the truly general case (N and M, perhaps unevenly distributed)?
In any case, I don't see the problem - allocate buffer sufficient to cover last level cache and give 1/N of it to each thread, where N is number of cores.
I will have to look at what various implementations actually do but I don't see any fundamental issue here.
If you look at the 2014 paper I linked before, it may discuss this or at least provide the necessary details to derive a good strategy.
|Mail converted by MHonArc 2.6.19+||http://listengine.tuxfamily.org/|