[eigen] Re: Proposal to remove aligned alloc paths |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Thoughts?Finally, there also was the point that it's better to use malloc rather than obscure other functions because it works better with instrumentation that people might use, that hook into or override standard allocation functions. That's a fair point, but such instrumentation, in order to be correct, has to instrument the platform's primary aligned-alloc function (posix_memalign or aligned_malloc on Windows) anyway. Out of the functions that we used, only mm_malloc stood out as relatively obscure. Since we shouldn't ever need it, and the handmade fallback isn't too terrible anyway, here is a second patch that removes it.Hi List,So for me, posix_memalign with 16-byte alignment is just 5% slower than malloc. Would that be enough to bother?This is to propose that we drop two paths from aligned_malloc: the "malloc is already aligned" path, and the mm_malloc path.Thus, we would be left with just these paths:1) The preferred system-specific aligned allocation (posix_memalign on POSIX, _aligned_malloc on Windows).2) handmade_aligned_malloc as the only fallback.See attached patches.Motivation for these changes, aside from removing code:The "malloc is already aligned" path relies on the assumption that malloc's nonstandard properties are only a function of the platform, which we tried to detect using preprocessor tokens. But malloc is not a system feature, it's a C library feature and many applications override it. Thus, we can't know if malloc is already aligned based on platform checks.Part of the motivation for the "malloc is already aligned" path was that on my linux system at the time, malloc was faster than posix_memalign. If you think that this is a reason to keep this path, please run the attached benchmark and share results. On my current machine running Mac OSX, I get:malloc+free for random sizes between 0 and 10...
Real:5.15e-08 s, CPU: 5.15e-08 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 10...
Real:5.34e-08 s, CPU: 5.34e-08 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 10...
Real:2.34e-07 s, CPU: 2.34e-07 s
malloc+free for random sizes between 0 and 1000...
Real:1.01e-07 s, CPU: 1.01e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 1000...
Real:1.06e-07 s, CPU: 1.06e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 1000...
Real:2.33e-07 s, CPU: 2.33e-07 s
malloc+free for random sizes between 0 and 100000...
Real:1.07e-07 s, CPU: 1.07e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 100000...
Real:1.13e-07 s, CPU: 1.13e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 100000...
Real:1.14e-07 s, CPU: 1.14e-07 s
malloc+free for random sizes between 0 and 10000000...
Real:5.9e-07 s, CPU: 5.9e-07 s
posix_memalign+free with 16-byte alignment for random sizes between 0 and 10000000...
Real:6.08e-07 s, CPU: 6.08e-07 s
posix_memalign+free with 32-byte alignment for random sizes between 0 and 10000000...
Real:5.78e-07 s, CPU: 5.78e-07 s
I added 32-byte alignment for completeness but of course malloc is not an option for that.
Benoit
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |