[eigen] Re: Proposal to remove aligned alloc paths

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


@Kaelin:

Like Mark says, I'm merely taking note of the fact that applications do override malloc in practice; in the case of cross-platform applications, they tend to do so on all platforms and can then rely on their custom allocator on all platforms, regardless of the merits of each platform's default allocator.

@ Christoph:

Thanks for these bug links. They show that the issue is affecting people in practice.
I totally agree that my benchmark is only covering one small usage pattern and results may vary a lot based on usage pattern. I just intended it as a sanity check, necessary but not sufficient, and as a basis if someone is concerned about another specific usage pattern and wants to adapt this benchmark to it.

Totally agree that if one were very concerned about alloc performance, one should focus on avoiding allocations rather than going for marginal improvements on the allocator.

Benoit

2014-12-13 2:25 GMT-08:00 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
Hi List,

This is to propose that we drop two paths from aligned_malloc: the "malloc is already aligned" path, and the mm_malloc path.

Thus, we would be left with just these paths:
 1) The preferred system-specific aligned allocation (posix_memalign on POSIX, _aligned_malloc on Windows).
 2) handmade_aligned_malloc as the only fallback.

See attached patches.

Motivation for these changes, aside from removing code:

The "malloc is already aligned" path relies on the assumption that malloc's nonstandard properties are only a function of the platform, which we tried to detect using preprocessor tokens. But malloc is not a system feature, it's a C library feature and many applications override it. Thus, we can't know if malloc is already aligned based on platform checks.

Part of the motivation for the "malloc is already aligned" path was that on my linux system at the time, malloc was faster than posix_memalign. If you think that this is a reason to keep this path, please run the attached benchmark and share results. On my current machine running Mac OSX, I get:

malloc+free for random sizes between 0 and 10...
Real:5.15e-08 s, CPU: 5.15e-08 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 10...
Real:5.34e-08 s, CPU: 5.34e-08 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 10...
Real:2.34e-07 s, CPU: 2.34e-07 s
malloc+free for random sizes between 0 and 1000...
Real:1.01e-07 s, CPU: 1.01e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 1000...
Real:1.06e-07 s, CPU: 1.06e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 1000...
Real:2.33e-07 s, CPU: 2.33e-07 s
malloc+free for random sizes between 0 and 100000...
Real:1.07e-07 s, CPU: 1.07e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 100000...
Real:1.13e-07 s, CPU: 1.13e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 100000...
Real:1.14e-07 s, CPU: 1.14e-07 s
malloc+free for random sizes between 0 and 10000000...
Real:5.9e-07 s, CPU: 5.9e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 10000000...
Real:6.08e-07 s, CPU: 6.08e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 10000000...
Real:5.78e-07 s, CPU: 5.78e-07 s

So for me, posix_memalign with 16-byte alignment is just 5% slower than malloc. Would that be enough to bother?

I added 32-byte alignment for completeness but of course malloc is not an option for that.

Finally, there also was the point that it's better to use malloc rather than obscure other functions because it works better with instrumentation that people might use, that hook into or override standard allocation functions. That's a fair point, but such instrumentation, in order to be correct, has to instrument the platform's primary aligned-alloc function (posix_memalign or aligned_malloc on Windows) anyway. Out of the functions that we used, only mm_malloc stood out as relatively obscure. Since we shouldn't ever need it, and the handmade fallback isn't too terrible anyway, here is a second patch that removes it.

Thoughts?

Benoit


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/