Re: [eigen] Clean aligned memory allocation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Tue, 2 Feb 2016, Gael Guennebaud wrote:

On Tue, Feb 2, 2016 at 2:46 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
wrote:

Since this "slop" isn't measurable from our side, measuring it is not
trivial. One universal way to approximate it is to look at process stats
such as resident set size (RSS) or virtual set size, say in `ps`. Specific
allocators may also expose functions to query actually allocated sizes;
IIRC, jemalloc does.


Yes, here are some numbers for different buffer sizes (first number),
allocated numerous times (second number), with either 16 or 32 bytes
alignment:

2000 Bytes x1000 /16B
handmade:   2625536
posix:      2621440
_mm_malloc: 4677632
malloc:     2629632

2048 Bytes x1000 /16B
handmade:   3137536
posix:      2621440
_mm_malloc: 4681728
malloc:     2621440

2050 Bytes x1000 /16B
handmade:   3137536
posix:      3137536
_mm_malloc: 5705728
malloc:     3141632

16 Bytes x100000 /16B
handmade:   3821568
posix:      2195456
_mm_malloc: 3833856
malloc:     2195456


2048 Bytes x100000 /32B
handmade:   257576960
posix:      206180352
_mm_malloc: 411779072

2000 Bytes x100000 /32B
handmade:   206176256
posix:      206176256
_mm_malloc: 411787264


_mm_malloc is definitely pretty bad, and indeed our handmade version can
waste a significant amount of bytes in worst cases.

I find the bad performance of _mm_malloc strange. Libstdc++ has 2 implementations for _mm_malloc, depending on the platform. One forwards to malloc or posix_memalign depending on the alignment (it is a bit too timid with the threshold it uses), and the other is essentially the same as your handmade version. Both might be slower because they have extra checks (if the alignment is not a power of 2, etc), but wasting more space is unexpected.

--
Marc Glisse



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/