Re: [eigen] Significant perf regression probably due to bug 363 patches |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Significant perf regression probably due to bug 363 patches
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Sun, 6 Nov 2011 14:27:22 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=TI3i03GuF80HMw8v/gM+0W//jBfc9DqTu8of4SCW8vg=; b=heu5f822lbGwbVopYFT55Y3jUhyIBboVI/6NVLdo1ukvQMs4/0OkM0bWkBFMY0uYSv XdoG7Q9PErh6WfRzp0yYM2Si1/OJfi5CdcLXBETgi44q0fhw7NcZy+Co5k0+jJ/SemnJ o2d821QUdWI9v8iUJXZJ8RiOrd8aYnZEZnr34=
OK, now I can reproduce the big 50% perf regression that you observed,
it's just that before I was looking only at subtests instead of the
final results.
The problem is just that, despite the inline keyword, the
check_rows_cols_for_overflow function is not getting inlined. Adding
__attribute__((always_inline)) on it fixes it. patch coming up.
Benoit
2011/11/6 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> Thanks, I can reproduce a 10% performance decrease on linux x86_64,
> g++ 4.6.1, core i7.
>
> investigating.
>
> Benoit
>
> 2011/11/6 Eamon Nerbonne <eamon@xxxxxxxxxxxx>:
>> Whoops, that's what you get for copyNpaste coding; the ClassDiagram function
>> was never intended to be called for scenario's other than in 2D (as the name
>> might suggest...). The perf degradation I previously mentioned was in the
>> 2d case (and that was a different benchmark anyhow), so this error doesn't
>> invalidate those results (the code works @ 2d). I've updated the function
>> to just ignore excess dimensions. The Eigen/EigenValues vs.
>> Eigen/Eigenvalues issue is due to the fact that I'm running these tests on
>> windows where paths are case insensitive.
>>
>>
>> While I was changing the benchmark anyhow, I've added an option to declare
>> matrix sizes as Dynamic and size matrices at run-time (the code wasn't
>> changed to avoid temporaries, so that'll be much slower due to memory
>> management). I also changed the code to scale the number of timings rather
>> than the workload per timing as dimensionality rises so that timings should
>> be comparable between different dimensions (of course, the task is harder at
>> higher dimensionality).
>>
>>
>> sample compile then:
>> g++ GgmLvqBench.cpp -std=c++0x -DNDEBUG -DLVQFLOAT=double -DLVQDIM=2 -O3
>> -march=native
>> and perhaps add -DLVQDYNAMIC or -DEIGEN_DONT_VECTORIZE
>>
>> At changeset 4284 (25822e1ace8d) update the decomposition catalogue
>> EigenBenchNV on GCC with 2(2) dimensions of doubles: 0.168s 189KB
>> EigenBenchV on GCC with 2(2) dimensions of doubles: 0.148s 268KB
>> EigenBenchNV on GCC with 2(Dynamic) dimensions of doubles: 0.718s 198KB
>> EigenBenchV on GCC with 2(Dynamic) dimensions of doubles: 0.78s 275KB
>> EigenBenchNV on GCC with 8(8) dimensions of floats: 1.01s 225KB
>> EigenBenchV on GCC with 8(8) dimensions of floats: 0.661s 275KB
>> EigenBenchNV on GCC with 8(Dynamic) dimensions of floats: 1.44s 202KB
>> EigenBenchV on GCC with 8(Dynamic) dimensions of floats: 1.17s 264KB
>>
>> At tip changeset 4321 (47b90dc56ada) Add test for Matrix(x, y) ctor static
>> assert added in previous changeset:
>> EigenBenchNV on GCC with 2(2) dimensions of doubles: 0.258s 201KB
>> EigenBenchV on GCC with 2(2) dimensions of doubles: 0.23s 272KB
>> EigenBenchNV on GCC with 2(Dynamic) dimensions of doubles: 0.844s 205KB
>> EigenBenchV on GCC with 2(Dynamic) dimensions of doubles: 0.909s 290KB
>> EigenBenchNV on GCC with 8(8) dimensions of floats: 1.13s 227KB
>> EigenBenchV on GCC with 8(8) dimensions of floats: 0.802s 291KB
>> EigenBenchNV on GCC with 8(Dynamic) dimensions of floats: 1.58s 210KB
>> EigenBenchV on GCC with 8(Dynamic) dimensions of floats: 1.31s 283KB
>>
>>
>> MSC isn't affected, and performs roughly comparably regardless:
>> EigenBenchNV on MSC with 2(2) dimensions of doubles: 0.3s 149KB
>> EigenBenchV on MSC with 2(2) dimensions of doubles: 0.252s 169KB
>> EigenBenchNV on MSC with 2(Dynamic) dimensions of doubles: 1s 160KB
>> EigenBenchV on MSC with 2(Dynamic) dimensions of doubles: 0.938s 185KB
>> EigenBenchNV on MSC with 8(8) dimensions of floats: 1.27s 168KB
>> EigenBenchV on MSC with 8(8) dimensions of floats: 1.01s 185KB
>> EigenBenchNV on MSC with 8(Dynamic) dimensions of floats: 1.93s 159KB
>> EigenBenchV on MSC with 8(Dynamic) dimensions of floats: 1.61s 184KB
>>
>>
>> So for GCC there's a slowdown in all scenarios although it's relatively more
>> significant in the cheapest (2d) cases.
>>
>> --eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163
>>
>>
>> On Sun, Nov 6, 2011 at 05:29, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>>
>>> 2011/11/5 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>>> > So this is really a bug in your code; did this ever work? Meanwhile,
>>> > we should try to generate an error at compile time in this case. This
>>> > should be easy since x and y are of templated type so we get to know
>>> > (in frame 3) that they're floats which doesn't make sense for (rows,
>>> > cols) dimensions.
>>>
>>> Done in changeset c6f51dc87530 , so your code now gives this compile
>>> error:
>>>
>>> In file included from eigen/Eigen/Core:289:0,
>>> from eigen/bench/BenchTimer.h:46,
>>> from Downloads/GgmLvqBench.cpp:25:
>>> eigen/Eigen/src/Core/PlainObjectBase.h: In member function ‘void
>>>
>>> Eigen::PlainObjectBase<Derived>::_init2(Eigen::PlainObjectBase<Derived>::Index,
>>> Eigen::PlainObjectBase<Derived>::Index, typename
>>> Eigen::internal::enable_if<(Eigen::PlainObjectBase<Derived>::Base::
>>> SizeAtCompileTime != 2), T0>::type*) [with T0 = float, T1 = float,
>>> Derived = Eigen::Matrix<float, 4, 1>,
>>> Eigen::PlainObjectBase<Derived>::Index = long int, typename
>>> Eigen::internal::enable_if<(Eigen::PlainObjectBase<Derived>::Base::
>>> SizeAtCompileTime != 2), T0>::type = float]’:
>>> eigen/Eigen/src/Core/Matrix.h:252:7: instantiated from
>>> ‘Eigen::Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows,
>>> _MaxCols>::Matrix(const T0&, const T1&) [with T0 = float, T1 = float,
>>> _Scalar = float, int _Rows = 4, int _Cols = 1, int _Options = 0, int
>>> _MaxRows = 4, int _MaxCols = 1]’
>>> Downloads/GgmLvqBench.cpp:386:76: instantiated from here
>>> eigen/Eigen/src/Core/PlainObjectBase.h:599:7: error: static assertion
>>> failed: "FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED"
>>>
>>> Benoit
>>>
>>>
>>
>>
>