Re: [eigen] Clean aligned memory allocation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]




2016-02-02 9:51 GMT-05:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:


On Tue, Feb 2, 2016 at 2:46 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
Since this "slop" isn't measurable from our side, measuring it is not trivial. One universal way to approximate it is to look at process stats such as resident set size (RSS) or virtual set size, say in `ps`. Specific allocators may also expose functions to query actually allocated sizes; IIRC, jemalloc does.

Yes, here are some numbers for different buffer sizes (first number), allocated numerous times (second number), with either 16 or 32 bytes alignment:

2000 Bytes x1000 /16B
handmade:   2625536
posix:      2621440
_mm_malloc: 4677632
malloc:     2629632

2048 Bytes x1000 /16B
handmade:   3137536
posix:      2621440
_mm_malloc: 4681728
malloc:     2621440

2050 Bytes x1000 /16B
handmade:   3137536
posix:      3137536
_mm_malloc: 5705728
malloc:     3141632

16 Bytes x100000 /16B
handmade:   3821568
posix:      2195456
_mm_malloc: 3833856
malloc:     2195456


2048 Bytes x100000 /32B
handmade:   257576960
posix:      206180352
_mm_malloc: 411779072

2000 Bytes x100000 /32B
handmade:   206176256
posix:      206176256
_mm_malloc: 411787264

Very interesting, thanks for the numbers!
Here, as expected, we see handmade perform less well than posix on exactly POT sizes, though I have to admit that I expected it to be worse than that!
It would be interesting to test some more different orders of magnitude of size, representing more typical allocations that we make, say all powers of 16? 4k, 64k, 1M, 16M.
Note:
 - page size (4k) might be special.
 - above a certain size, allocators will typically stop using POT sizes, so for a given allocator, there will be a size constant above which handmade overhead gets smaller.
 - it would be interesting to compare different allocators.

Benoit

 


_mm_malloc is definitely pretty bad, and indeed our handmade version can waste a significant amount of bytes in worst cases.

Unfortunately, posix_memalign is also pretty bad regarding realloc: since there is no "posix_realloc_align", we have to do it by hand by explicitly allocating a new buffer and doing the copy whereas in many(?) cases std::realloc amounts to a nearly no-op. Ok, I'll try to come up with some numbers....

gael



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/