On Tue, Feb 2, 2016 at 2:46 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
wrote:
Since this "slop" isn't measurable from our side, measuring it is not
trivial. One universal way to approximate it is to look at process stats
such as resident set size (RSS) or virtual set size, say in `ps`. Specific
allocators may also expose functions to query actually allocated sizes;
IIRC, jemalloc does.
Yes, here are some numbers for different buffer sizes (first number),
allocated numerous times (second number), with either 16 or 32 bytes
alignment:
2000 Bytes x1000 /16B
handmade: 2625536
posix: 2621440
_mm_malloc: 4677632
malloc: 2629632
2048 Bytes x1000 /16B
handmade: 3137536
posix: 2621440
_mm_malloc: 4681728
malloc: 2621440
2050 Bytes x1000 /16B
handmade: 3137536
posix: 3137536
_mm_malloc: 5705728
malloc: 3141632
16 Bytes x100000 /16B
handmade: 3821568
posix: 2195456
_mm_malloc: 3833856
malloc: 2195456
2048 Bytes x100000 /32B
handmade: 257576960
posix: 206180352
_mm_malloc: 411779072
2000 Bytes x100000 /32B
handmade: 206176256
posix: 206176256
_mm_malloc: 411787264
_mm_malloc is definitely pretty bad, and indeed our handmade version can
waste a significant amount of bytes in worst cases.