Re: [eigen] Performance question

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

Here are just a few pointers where to start looking.

First, not all operations are vectorized. Some operations can't be
efficiently vectorized. That includes small determinants, and also
much of the ParametrizedLine and Hyperplane stuff. So it sounds like
there's a possibility that your program is just spending most of its
time in operations that can't be vectorized.

Second, you are seeing a 50% speed difference between floats and
doubles, even without vectorization. That's very unusual. Normally,
the speed difference is much smaller or inexistent, on both 32bit and
64bit systems. So it sounds like your app is memory-bound, so any
vectorization won't help. Typically, try to reorder the way you treat
data, so you minimize memory accesses and maximize how many
computations you do at once on a given set of data. If you really need
to load a large number of vectors at once, then try at least to be
cache-friendly: try to find a way to keep that number of vectors low
enough to fit in CPU caches.

Cheers,
Benoit

2009/2/23 Yves Bailly <yves.bailly@xxxxxxxxxxx>:
> Hello all,
>
> My first post on this list... so first of all, a great thank to all
> developers (geniuses?) who produced Eigen.
>
> Now my question, which is about a "small" bench I made. I don't use
> very big vectors or matrices, just small ones, to performe some
> 3D geometry algorithms on triangular meshes. The thing is, I have
> millions of points and triangles.
>
> The core of the process involves two triangular meshes. One is
> transformed (actually translated), then each vertex of the second is
> tested against each triangle of the first one. Here, "tested" means
> a ray (Eigen::ParametrizedLine) is casted from the vertex, its
> intersection with the plane defined by the triangle (Eigen::Hyperplane)
> is computed, then a check is made to know if this intersection is
> inside the triangle or not (Eigen::Matrix::determinant). This is done
> over and over.
>
> I tried using Vector3f, Vector3d, Vector4f and Vector4d as base
> point type, the other types being adapted accordingly. And using
> (or trying to use) vectorization or not, i.e. giving -msse2 or
> -DEIGEN_DONT_VECTORIZE to g++. The test platform is Kubuntu 8.10
> (amd64) on AMD Athlon 64X2 5600+ (dual core), with 3GB of RAM.
>
> What I expected: using Vector4f or Vector4d with SSE2 should give
> better results (lower times).
>
> The actual results (each test done 3 times and averaged), times
> in seconds, lower is better:
>             +-----+-----+-----+-----+
>             | 3f  | 3d  | 4f  | 4d  |
> -------------+-----+-----+-----+-----+
> with sse2    | 213 | 314 | 272 | 397 |
> -------------+-----+-----+-----+-----+
> without sse2 | 220 | 307 | 272 | 400 |
> -------------+-----+-----+-----+-----+
>
> So... no performance gain when using 4x in place of 3x, no
> performance gain when using SSE2.
>
> I did my best to avoid creating temporaries, to use lazy eval,
> and so on... Any hint about a typical place I could check to
> find *why* I don't gain anything would be very appreciated.
>
> If needed, I can provide source code (less than 700 lines).
>
> Best regards,
>
> --
> (o< | Yves Bailly                          | -o)
> //\ | Linux Dijon  : http://www.coagul.org | //\
> \_/ |                                      | \_/`
>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/