Re: [eigen] Parallel matrix multiplication causes heap allocation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Parallel matrix multiplication causes heap allocation
From: Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 19 Dec 2016 11:47:06 +0100

If I read the source correctly, it seems that for parallel matrixproducts we always call aligned_new for blockA:


parallelize_gemm
  -> gemm_functor::initParallelSession(Index num_threads)
    -> gemm_blocking_space::allocateA()
       -> aligned_new(size_t size)

blockB is still stack-constructed. Only for non-parallel gemm, both arestack-constructed (if possible).

Did someone have a look at what blaze [1] does? They seem to be prettyadvanced regarding parallelism -- they also parallelize "simple" thingslike vector addition, which if we trust their benchmarks [2] seems to bebeneficial starting at something like 50000 doubles.


(N.B. we really need to update our own benchmarks at some point ...)

Christoph

[1] https://bitbucket.org/blaze-lib/blaze
[2] https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks



On 2016-12-19 05:12, Gael Guennebaud wrote:

Sorry, my intervention was too brief to really make sense. I was more
thinking about the practical difficulties to implement it in a header-only
way, without dependencies, with C++03 features, in a portable way... With
C++11 this should be doable, so why not giving a try with the current
strategy as fallback.

BTW, I should add that we also make use of static stack allocation for
small enough matrices, as controlled by EIGEN_STACK_ALLOCATION_LIMIT.

gael

On Mon, Dec 19, 2016 at 2:35 AM, Jeff Hammond <jeff.science@xxxxxxxxx>
wrote:



On Dec 18, 2016, at 2:46 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
wrote:



On Sun, Dec 18, 2016 at 11:15 PM, Jeff Hammond <jeff.science@xxxxxxxxx>
wrote:



All good GEMM implementations copy into temporaries but they should not
have to dynamically allocate them in every call.

Some BLAS libraries allocate the buffer only once, during the first call.
GotoBLAS used to do this. I don't know if OpenBLAS still does this, or in
which cases.

At one time, BLIS (cousin of OpenBLAS) used a static buffer that was part
of the data segment, so malloc was not necessary to get the temporary
buffers. I think it can dynamically allocate instead now but only once
during the first call.



but what about multi-threading?  (GEMM being called from multiple threads)


You mean N threads calling 1 GEMM, N threads calling N GEMM or the truly
general case (N and M, perhaps unevenly distributed)?

In any case, I don't see the problem - allocate buffer sufficient to cover
last level cache and give 1/N of it to each thread, where N is number of
cores.

I will have to look at what various implementations actually do but I
don't see any fundamental issue here.

If you look at the 2014 paper I linked before, it may discuss this or at
least provide the necessary details to derive a good strategy.

Jeff

gael

Jeff

gael


François



On 18 Dec 2016, at 01:06, Rene Ahlsdorf <ahlsdorf@xxxxxxxxxxxxxxxxxx>
wrote:

Dear Eigen team,

first of all, thank you for all your effort to create such a great math
library. I really love using it.

I’ve got a question about your parallelization routines. I want to
calculate a parallel (omp based) matrix multiplication (result: 500 x 250
matrix) without allocating any new space in the meantime. So I have
activated „Eigen::internal::set_is_malloc_allowed(false)“ to check that
nothing goes wrong. However, my program crashes with the error message
*„Assertion failed: (is_malloc_allowed() && "heap allocation is
forbidden (EIGEN_RUNTIME_NO_MALLOC is defined and g_is_malloc_allowed is
false)"), function check_that_malloc_is_allowed, file
/Users/xxx//libs/eigen/Eigen/src/Core/util/Memory.h, line 143.“*. Is
this behaviour desired? Should there be an allocation before doing parallel
calculations? Or am I doing something wrong?

Thanks in advance.

Regards,
René Ahlsdorf

Eigen Version: 3.3.1 (commit f562a193118d)

My code: https://gist.github.com/anonymous/d57c835171b2068817b9
f82493b43ea7

*Attached*: Screenshot showing the last function calls
<Screenshot 2016-12-18 01.01.42.png>


--
 Dipl. Inf., Dipl. Math. Christoph Hertzberg

 Universität Bremen
 FB 3 - Mathematik und Informatik
 AG Robotik
 Robert-Hooke-Straße 1
 28359 Bremen, Germany

 Zentrale: +49 421 178 45-6611

 Besuchsadresse der Nebengeschäftsstelle:
 Robert-Hooke-Straße 5
 28359 Bremen, Germany

 Tel.:    +49 421 178 45-4021
 Empfang: +49 421 178 45-6600
 Fax:     +49 421 178 45-4150
 E-Mail:  chtz@xxxxxxxxxxxxxxxxxxxxxxxx

 Weitere Informationen: http://www.informatik.uni-bremen.de/robotik

Follow-Ups:
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: François Fayard

References:
- [eigen] Parallel matrix multiplication causes heap allocation
  - From: Rene Ahlsdorf
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: François Fayard
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: Gael Guennebaud
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: Jeff Hammond
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: Gael Guennebaud
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: Jeff Hammond
- Re: [eigen] Parallel matrix multiplication causes heap allocation
  - From: Gael Guennebaud

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Parallel matrix multiplication causes heap allocation
Next by Date: Re: [eigen] Parallel matrix multiplication causes heap allocation
Previous by thread: Re: [eigen] Parallel matrix multiplication causes heap allocation
Next by thread: Re: [eigen] Parallel matrix multiplication causes heap allocation

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/