Re: [eigen] Parallel matrix multiplication causes heap allocation |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
You mean N threads calling 1 GEMM, N threads calling N GEMM or the truly general case (N and M, perhaps unevenly distributed)? In any case, I don't see the problem - allocate buffer sufficient to cover last level cache and give 1/N of it to each thread, where N is number of cores. I will have to look at what various implementations actually do but I don't see any fundamental issue here. If you look at the 2014 paper I linked before, it may discuss this or at least provide the necessary details to derive a good strategy. Jeff
|
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |