Re: [eigen] Proposal to remove aligned alloc paths

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

there are two bz entries on that issue (the latter is an example where the current code actually causes problems):

http://eigen.tuxfamily.org/bz/show_bug.cgi?id=761
http://eigen.tuxfamily.org/bz/show_bug.cgi?id=779

I agree on removing the EIGEN_MALLOC_ALREADY_ALIGNED path (at least for 3.3).

I'm not sure about the informative value of your benchmark. If you have lots of small allocations it can also important how they are arranged in memory (but that very much depends on how the data is used).

Overall, I think from an optimizing point of view it is much more important to reduce the number of allocations as much as possible (see bugs 51 and 789)


Christoph


On 13.12.2014 11:25, Benoit Jacob wrote:
Hi List,

This is to propose that we drop two paths from aligned_malloc: the "malloc
is already aligned" path, and the mm_malloc path.

Thus, we would be left with just these paths:
  1) The preferred system-specific aligned allocation (posix_memalign on
POSIX, _aligned_malloc on Windows).
  2) handmade_aligned_malloc as the only fallback.

See attached patches.

Motivation for these changes, aside from removing code:

The "malloc is already aligned" path relies on the assumption that malloc's
nonstandard properties are only a function of the platform, which we tried
to detect using preprocessor tokens. But malloc is not a system feature,
it's a C library feature and many applications override it. Thus, we can't
know if malloc is already aligned based on platform checks.

Part of the motivation for the "malloc is already aligned" path was that on
my linux system at the time, malloc was faster than posix_memalign. If you
think that this is a reason to keep this path, please run the attached
benchmark and share results. On my current machine running Mac OSX, I get:

malloc+free for random sizes between 0 and 10...
Real:5.15e-08 s, CPU: 5.15e-08 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and
10...
Real:5.34e-08 s, CPU: 5.34e-08 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and
10...
Real:2.34e-07 s, CPU: 2.34e-07 s
malloc+free for random sizes between 0 and 1000...
Real:1.01e-07 s, CPU: 1.01e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and
1000...
Real:1.06e-07 s, CPU: 1.06e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and
1000...
Real:2.33e-07 s, CPU: 2.33e-07 s
malloc+free for random sizes between 0 and 100000...
Real:1.07e-07 s, CPU: 1.07e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and
100000...
Real:1.13e-07 s, CPU: 1.13e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and
100000...
Real:1.14e-07 s, CPU: 1.14e-07 s
malloc+free for random sizes between 0 and 10000000...
Real:5.9e-07 s, CPU: 5.9e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and
10000000...
Real:6.08e-07 s, CPU: 6.08e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and
10000000...
Real:5.78e-07 s, CPU: 5.78e-07 s

So for me, posix_memalign with 16-byte alignment is just 5% slower than
malloc. Would that be enough to bother?

I added 32-byte alignment for completeness but of course malloc is not an
option for that.

Finally, there also was the point that it's better to use malloc rather
than obscure other functions because it works better with instrumentation
that people might use, that hook into or override standard allocation
functions. That's a fair point, but such instrumentation, in order to be
correct, has to instrument the platform's primary aligned-alloc function
(posix_memalign or aligned_malloc on Windows) anyway. Out of the functions
that we used, only mm_malloc stood out as relatively obscure. Since we
shouldn't ever need it, and the handmade fallback isn't too terrible
anyway, here is a second patch that removes it.

Thoughts?

Benoit



--
----------------------------------------------
Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252
----------------------------------------------



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/