Re: [eigen] Parallel matrix multiplication causes heap allocation |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Parallel matrix multiplication causes heap allocation*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Mon, 19 Dec 2016 19:24:41 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=T5rmu31cA4P1Rfx9ysrzP52loSn3yDdW0qB136NxapI=; b=Giiy5J/i8h9Q0nSkhGq2uoGGokNS4Qg//tcdAgQl15uzsV+AEXrcOvk2YV3kOp6PM5 b0LNiwDYpZuBcvKrQuIV/i2/wCQ9Y41aO8kONql8ybh7M//Oz6wbGe8ai1JiYjh6hRv6 bCeoQJGQo2xOEoWGG9IN/7tOC1enE0uGRpJIlLncIHr3FEprgzxbATCtV+7cpEV5gXXW Uclh8RCHt5U3IRmlmdmUX9h0wO/PCET/Ym7fSrD2ddLtdCpCFFIU+Sw+vJsmVYP7cj3J rmF3rrXzemzCnCNkAX8Z/PCY5rUG3erRy08C4yqg1CORDLfqV+obxV5sDMtTZPaihh19 NjdQ==

On Mon, Dec 19, 2016 at 3:24 PM, François Fayard <fayard@xxxxxxxxxxxxx> wrote:

> Did someone have a look at what blaze [1] does? They seem to be pretty advanced regarding parallelism -- they also parallelize "simple" things like vector addition, which if we trust their benchmarks [2] seems to be beneficial starting at something like 50000 doubles

You should not trust benchmarks, especially where they have been done by people who wrote the software :-)

Here is my attempt at this game, multiplying 10 000 x 10 000 "double" matrices on a Dual Xeon 2660-v4 (Haswell), 2x14 cores, Broadwell, 2400 Mhz memory, with the latest release of every single library (as of today) :

- MKL: 740 GFlops

- OpenBLAS: 540 GFlops

- Eigen: 440 GFlops

- Blaze: 440 GFlops

This machine is capable of: 2 x 14 (cores) x 2 (2 FMA ports) x 2 (FMA) x 4 (AVX2) x 2 (GHz) = 896 GFlops. Hyperthreading is turned off and the CPU frequency is blocked at 2 GHz.

On a MacBook Pro (2014), with a 4 core Haswell, I get:

- MKL: 170 GFlops

- Eigen: 130 GFlops

For Blaze, for things such as vector addition, they also use streaming stores to speed up the process.

speaking about Blaze, when benchmarking this library it is important to specify which BLAS backend is enabled (for large enough matrices you are mostly benchmarking the underlying BLAS), and whether padding is enabled. Indeed, by default blaze matrices are padded such that each row (or column) are aligned on 32 bytes boundary. This make some significant difference for some small to medium matrix sizes, but this also waste memory. In my opinion, it's kind of cheating to compare "blaze+padding" versus "otherlib+no-papping", and this is what they benchmark report.

gael

François

**References**:**[eigen] Parallel matrix multiplication causes heap allocation***From:*Rene Ahlsdorf

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*François Fayard

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*Gael Guennebaud

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*Jeff Hammond

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*Gael Guennebaud

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*Jeff Hammond

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*Gael Guennebaud

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*Christoph Hertzberg

**Re: [eigen] Parallel matrix multiplication causes heap allocation***From:*François Fayard

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Parallel matrix multiplication causes heap allocation** - Next by Date:
**Re: [eigen] Parallel matrix multiplication causes heap allocation** - Previous by thread:
**Re: [eigen] Parallel matrix multiplication causes heap allocation** - Next by thread:
**Re: [eigen] Parallel matrix multiplication causes heap allocation**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |