|Re: [eigen] Eigen 2 to Eigen 3 performance regressions with mapped matrices|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Eigen 2 to Eigen 3 performance regressions with mapped matrices
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 12 Jan 2012 08:42:53 +0100
- Cc: Sameer Agarwal <sameeragarwal@xxxxxxxxxx>, tucker@xxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=AJQ4WzKVHH3hYORM0gmtw3/aQENCT4RiNb+6bgLEgrY=; b=yAij/LFfoYnVunzip7i2KGnT4UWjK3/fGKNRFFIYS4LT6zwPC31KtTSDh7PqbZ1SaZ DPOLsv3mn5rD7vG9AEXrYnjAMYUf/8sl+afe6J0qim6U9nNUioEuY283/8vVTYapYDS+ pB3ycoJwjRX2ixH5qu+lbxGdiqQa+/XYgY9y4=
well first you should really use 1 instead of Dynamic for the vectors
such that gemv like operations are called (instead of gemm like).
Then, the main difference with Eigen2, is that we don't check anymore
the sizes at runtime to fallback to a naive product implementation if
the objects are too small. Again, you can still enforce the naive
product with .lazyProduct if you know that's best for you.
That said, I still plan to add such runtime tests to pick the right
algorithm. I think there is still room for designing even better
product algorithms for such small matrices and vectors. However I
observed the performance of a "naive" product algorithm depends a lot
on the architecture and compiler for small objects, so the choice of
the thresholds is rather difficult.
I'll add an entry in our bug tracker.
On Wed, Jan 11, 2012 at 4:58 AM, Keir Mierle <mierle@xxxxxxxxx> wrote:
> I've attached a microbenchmark that is similar in spirit to what we are
> doing with Eigen, that illustrates slowdown from Eigen 2 to Eigen 3. In
> particular, the benchmark does y += A*x, for A, x, y mapped unaligned
> dynamic but small dimension matrices. It could be that I have not chosen
> appropriate compiler flags. I am seeing performance 2x to 3x worse. Take a
> look at the header comments in the attached benchmark for more numbers.