All good GEMM implementations copy into temporaries but they should not have to dynamically allocate them in every call.Some BLAS libraries allocate the buffer only once, during the first call. GotoBLAS used to do this. I don't know if OpenBLAS still does this, or in which cases.At one time, BLIS (cousin of OpenBLAS) used a static buffer that was part of the data segment, so malloc was not necessary to get the temporary buffers. I think it can dynamically allocate instead now but only once during the first call.

but what about multi-threading? (GEMM being called from multiple threads)

I’ve got a question about your parallelization routines. I want to calculate a parallel (omp based) matrix multiplication (result: 500 x 250 matrix) without allocating any new space in the meantime. So I have activated „Eigen::internal::set_is_malloc_allowed(false)“ to check that nothing goes wrong. However, my program crashes with the error message „Assertion failed: (is_malloc_allowed() && "heap allocation is forbidden (EIGEN_RUNTIME_NO_MALLOC is defined and g_is_malloc_allowed is false)"), function check_that_malloc_is_allowed, file /Users/xxx//libs/eigen/Eigen/s. Is this behaviour desired? Should there be an allocation before doing parallel calculations? Or am I doing something wrong?rc/Core/util/Memory.h, line 143.“

