Re: [eigen] Preventing Memory Allocation for Large GEMM

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Gael and Christoph,

Thanks a lot! I'll look at these as solutions and consider the efficiency tradeoffs in our setting.

Graham

On Thu, Jun 23, 2016 at 6:15 PM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On 2016-06-21 23:35, Graham Neubig wrote:
Hi All,

I have a question about GEMM on Eigen. We're creating a library where we'd
like to prevent additional memory allocation, so we've been testing
with EIGEN_NO_MALLOC. However, whether Eigen is allocating memory seems to
be inconsistent for GEMM, with no memory allocated for smaller
multiplications, but memory allocated for larger multiplications. I'm
running something similar to the following command, and when the size gets
large enough (50 or 100 or so) it starts dying.

x.noalias() += y.transpose() * z;

If you want to be sure that no malloc (or alloca) happens, you can try switching to:

  x.noalias() += y.transpose().lazyProduct(z);

lazyProduct will always evaluate the product coefficient- (or packet)-wise. That means it will not be optimized wrt blocking, i.e., parts of the sources will be read multiple times and depending on your matrix and cache sizes this could result in lots of cache misses.

Beside the "no malloc" guarantee, you will also likely get a smaller binary if you only use lazyProduct (which might be relevant for some use cases).

We could think about adding a compile-flag which forces lazyProducts to make switching more convenient.


Christoph






Looking at the stack trace included at the end of the message, it seems the
Eigen is parallelizing the GEMM, which is allocating memory. Is there any
way to prevent this parallelization to ensure that we work within the
memory already allocated?

Graham

#5  0x00000001002ae571 in Eigen::internal::aligned_malloc(unsigned long) ()
#6  0x0000000100139fb0 in
Eigen::internal::general_matrix_matrix_product<long, float, 1, false,
float, 0, false, 0>::run(long, long, long, float const*, long, float
const*, long, float*, long, float, Eigen::internal::level3_blocking<float,
float>&, Eigen::internal::GemmParallelInfo<long>*) ()
#7  0x0000000100139ae5 in Eigen::internal::gemm_functor<float, long,
Eigen::internal::general_matrix_matrix_product<long, float, 1, false,
float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1,
-1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const>,
Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0>
, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0,
0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1,
false> >::operator()(long, long, long, long,
Eigen::internal::GemmParallelInfo<long>*) const ()
#8  0x0000000100139956 in void Eigen::internal::parallelize_gemm<true,
Eigen::internal::gemm_functor<float, long,
Eigen::internal::general_matrix_matrix_product<long, float, 1, false,
float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1,
-1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const>,
Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0>
, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0,
0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1,
false> >, long>(Eigen::internal::gemm_functor<float, long,
Eigen::internal::general_matrix_matrix_product<long, float, 1, false,
float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1,
-1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const>,
Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0>
, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0,
0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1,
false> > const&, long, long, bool) ()
#9  0x0000000100137b1d in void
Eigen::internal::generic_product_impl<Eigen::Transpose<Eigen::Map<Eigen::Matrix<float,
-1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const>,
Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0>
, Eigen::DenseShape, Eigen::DenseShape,
8>::scaleAndAddTo<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0,
Eigen::Stride<0, 0> > >(Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>,
0, Eigen::Stride<0, 0> >&, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float,
-1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const> const&,
Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0>
const&, float const&) ()


--
 Dipl. Inf., Dipl. Math. Christoph Hertzberg

 Universität Bremen
 FB 3 - Mathematik und Informatik
 AG Robotik
 Robert-Hooke-Straße 1
 28359 Bremen, Germany

 Zentrale: +49 421 178 45-6611

 Besuchsadresse der Nebengeschäftsstelle:
 Robert-Hooke-Straße 5
 28359 Bremen, Germany

 Tel.:    +49 421 178 45-4021
 Empfang: +49 421 178 45-6600
 Fax:     +49 421 178 45-4150
 E-Mail:  chtz@xxxxxxxxxxxxxxxxxxxxxxxx

 Weitere Informationen: http://www.informatik.uni-bremen.de/robotik





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/