|Re: [eigen] Clean aligned memory allocation|
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Yes, I'm aware of all of that, but I also think that we should remove all paths which are not based on std::malloc so that users can replace it.
Then we can indeed discuss what should be the default for EIGEN_MALLOC_ALREADY_ALIGNED. According to my benchmarks, the difference between std::malloc and handmade_aligned_malloc is not on the speed level, but on the memory consumption overhead. Of course, this is marginal for large enough matrices, and my last suggestion would cancel it for extremely small ones. Still remains the case of intermediate ones, for which it is better to use a dedicated allocator anyway (but that's another topic: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=166).All in all, I'm OK with either default choice for EIGEN_MALLOC_ALREADY_ALIGNED.gaelOn Tue, Feb 2, 2016 at 1:59 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:2016-02-01 17:03 GMT-05:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:Hi,This email is mostly intended to Eigen's developers, but any users concerned about Eigen's dynamic aligned memory allocation mechanism is welcome to share its thoughts and expertise.As discussed in bugs 761 and 779, our dynamic memory allocation mechanism would deserve some cleaning.Wouldn't both of these bugs be averted if we dropped EIGEN_MALLOC_ALREADY_ALIGNED ?I understand that it tends to perform best because it makes us hit the most common case in the memory allocator.. But like I tried to express in this thread,
it relies on the unsafe assumption that the memory allocator is a property of the platform and can thus be inferred from preprocessor checks.
That's not the case: many large applications use their own memory allocator, and that's a runtime property, not something that Eigen could detect at compile time.
(Here I need to pause to acknowledge responsibility in this problem that I'm describing!)So while I understand that for many users EIGEN_MALLOC_ALREADY_ALIGNED is in fact safe and performs best, it is on the whole an unsafe optimization. Maybe it should not be the default? How about keeping it around (if you wish) as a non-default optimization option.Another thought: do any performance differences here actually matter outside of micro-benchmarks, and if yes, is that a sign of a problem that we could try to address at a different level, trying to change either implementation of interface to reduce allocator overhead?Cheers,BenoitAs you can see there , the current logic is rather complicated and thus prone to bugs and it is difficult to predict what will be used under the hood.I did some testing on both Linux and OSX, and I found that both posix_memalign and _mm_malloc are slightly slower than our handmade_aligned_malloc which is as fast as the non-aligned std::malloc (probably thanks to inlining). I observed almost a factor 2 for 32 bytes alignment.Regarding memory overhead, there is no winners for 32 bytes alignment, but for 16 byes alignment, std::malloc is clearly better as it is already aligned on most systems.With that in mind, we can remove both the posix_memalign and _mm_malloc paths without any regrets.This means that on Unix, std::malloc will never be bypassed anymore.It remains the question of windows and its _aligned_malloc routine that I did not benched. For consistency with unix systems, I would propose to remove that path too.Then, I also propose to keep EIGEN_MALLOC_ALREADY_ALIGNED as it is clearly better regarding memory consumption. In case of trouble, one can still compile with -DEIGEN_MALLOC_ALREADY_ALIGNED=0 to bypass the default assumptions (e.g. if using a custom non-aligned malloc). Nonetheless, to be future proof, I propose to extend this notion of EIGEN_MALLOC_ALREADY_ALIGNED to something more general as EIGEN_MALLOC_DEFAULT_ALIGNMENT whose value would be the default alignment in bytes (e.g., 8, 16, 32?).Finally, to reduce the overhead of handmade_aligned_malloc for tiny buffers, I propose to disable the alignment of buffers smaller than the requested alignment. This can easily be done by passing the buffer size to aligned_free().Those changes should be the latest major ones before Eigen 3.3 beta2 and the RCs...cheers,Gael.
|Mail converted by MHonArc 2.6.19+||http://listengine.tuxfamily.org/|