Re: [eigen] Re: platform detection for aligned malloc |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Re: platform detection for aligned malloc
- From: "Benoit Jacob" <jacob.benoit.1@xxxxxxxxx>
- Date: Fri, 9 Jan 2009 17:18:41 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=4tJrIiDtPiog/1IlsAnIrYK5TxtUtGraB04WBuQIGFU=; b=Udd+lSq5IdQVTQb7FoHNzWMmZxVg7d7e2AjgplWcc7mPP474RVLpXVxGF2djHvhlqj XwCu4ivvOosGZHtzz/pWcw1YKIYg4PmexvTLIimypjGmKHReASQFeEWT5w0xW6K79Dr4 LIiPaUHI4Zpq3cYWlfABtohPsErBP3seLLEE0=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=ql4NGSBQjk4ijF9MPFLBMbSQgQoxL5gn/A837GwGXQ3oJlUPLH99by8/rIrfyFTsS2 tIhyi5xgkETotUnZtfloS4RDzITH1W4oKMhUZ3g4BvdLDSZg/k48JmngjDNUEz/ZTKtA a3OoSR/mA9/1KXTvpWKgj/esu2f45u/HrLKak=
There was a little bug in b.cpp (should be ~15, not ~16) but that
didn't make any difference.
anyway i attach the corrected version.
2009/1/9 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> So, I've benchmarked all three aligned malloc methods available on my
> system: my handmade func, posix_memalign, _mm_malloc.
>
> Executive summary:
> * i logged off kde 4.2 to run the benchmark but i didn't expect that
> it would be so useful: typical times went from 17 sec (run from kde
> session, all nontrivial apps closed) down to 10 sec (logged off kde,
> run from vt) !!! But the idle CPU usage of my kde session is
> negligible, so I don't understand.
> * all methods are a few percent faster with sizes that are multiples
> of 16 bytes, than with sizes that can be anything
> * all methods are much faster for small sizes than for large sizes
> * for large sizes, all three methods have the same speed (could not
> get any meaningful difference)
> * for small sizes (less than 1 kB) my handmade function is almost 2x
> faster than the others! Can be explained by the fact that it's
> hardcoded for 16-byte alignment.
>
> b.cpp:6:2: warning: #warning allocating big blocks of all sizes
> b.cpp:28:2: warning: #warning using handmade alloc
>
> real 0m10.534s
> user 0m1.004s
> sys 0m9.529s
> b.cpp:6:2: warning: #warning allocating big blocks of all sizes
> b.cpp:40:2: warning: #warning using posix_memalign
>
> real 0m10.092s
> user 0m1.156s
> sys 0m8.937s
> b.cpp:6:2: warning: #warning allocating big blocks of all sizes
> b.cpp:34:2: warning: #warning using _mm_malloc
>
> real 0m11.902s
> user 0m1.080s
> sys 0m10.825s
> b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16
> b.cpp:28:2: warning: #warning using handmade alloc
>
> real 0m9.877s
> user 0m0.968s
> sys 0m8.909s
> b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16
> b.cpp:40:2: warning: #warning using posix_memalign
>
> real 0m10.417s
> user 0m1.104s
> sys 0m9.313s
> b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16
> b.cpp:34:2: warning: #warning using _mm_malloc
>
> real 0m11.194s
> user 0m1.100s
> sys 0m10.097s
> b.cpp:16:2: warning: #warning allocating small blocks of all sizes
> b.cpp:28:2: warning: #warning using handmade alloc
>
> real 0m0.360s
> user 0m0.356s
> sys 0m0.004s
> b.cpp:16:2: warning: #warning allocating small blocks of all sizes
> b.cpp:40:2: warning: #warning using posix_memalign
>
> real 0m0.562s
> user 0m0.564s
> sys 0m0.000s
> b.cpp:16:2: warning: #warning allocating small blocks of all sizes
> b.cpp:34:2: warning: #warning using _mm_malloc
>
> real 0m0.590s
> user 0m0.592s
> sys 0m0.004s
> b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16
> b.cpp:28:2: warning: #warning using handmade alloc
>
> real 0m0.303s
> user 0m0.304s
> sys 0m0.000s
> b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16
> b.cpp:40:2: warning: #warning using posix_memalign
>
> real 0m0.607s
> user 0m0.600s
> sys 0m0.008s
> b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16
> b.cpp:34:2: warning: #warning using _mm_malloc
>
> real 0m0.591s
> user 0m0.588s
> sys 0m0.004s
>
#include <Eigen/Core>
using namespace Eigen;
#ifdef EIGEN_BIGALLOC
#warning allocating big blocks of all sizes
#define EIGEN_SIZE(i) i
#endif
#ifdef EIGEN_BIGMULTIPLEALLOC
#warning allocating big blocks of sizes multiples of 16
#define EIGEN_SIZE(i) i&(~15)
#endif
#ifdef EIGEN_SMALLALLOC
#warning allocating small blocks of all sizes
#define EIGEN_SIZE(i) 1+(i&1023)
#endif
#ifdef EIGEN_SMALLMULTIPLEALLOC
#warning allocating small blocks of sizes multiples of 16
#define EIGEN_SIZE(i) (1+(i&1023))&(~15)
#endif
#ifdef EIGEN_HANDMADE
#warning using handmade alloc
#define EIGEN_ALLOC void *p = ei_handmade_aligned_malloc(EIGEN_SIZE(i));
#define EIGEN_FREE ei_handmade_aligned_free(p);
#endif
#ifdef EIGEN_MM
#warning using _mm_malloc
#define EIGEN_ALLOC void *p = _mm_malloc(EIGEN_SIZE(i),16);
#define EIGEN_FREE _mm_free(p);
#endif
#ifdef EIGEN_POSIX
#warning using posix_memalign
#define EIGEN_ALLOC void *p; posix_memalign(&p, 16, EIGEN_SIZE(i));
#define EIGEN_FREE free(p);
#endif
#ifdef EIGEN_STACK
#define EIGEN_ALLOC void *p = ei_aligned_stack_alloc(EIGEN_SIZE(i));
#define EIGEN_FREE ei_aligned_stack_free(p,i);
#endif
int main()
{
for(int i = 1; i < 2000000; i++)
{
EIGEN_ALLOC
EIGEN_FREE
}
}