Re: [eigen] Re: platform detection for aligned malloc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


So, I've benchmarked all three aligned malloc methods available on my
system: my handmade func, posix_memalign, _mm_malloc.

Executive summary:
* i logged off kde 4.2 to run the benchmark but i didn't expect that
it would be so useful: typical times went from 17 sec (run from kde
session, all nontrivial apps closed) down to 10 sec (logged off kde,
run from vt) !!! But the idle CPU usage of my kde session is
negligible, so I don't understand.
* all methods are a few percent faster with sizes that are multiples
of 16 bytes, than with sizes that can be anything
* all methods are much faster for small sizes than for large sizes
* for large sizes, all three methods have the same speed (could not
get any meaningful difference)
* for small sizes (less than 1 kB) my handmade function is almost 2x
faster than the others! Can be explained by the fact that it's
hardcoded for 16-byte alignment.

b.cpp:6:2: warning: #warning allocating big blocks of all sizes
b.cpp:28:2: warning: #warning using handmade alloc

real	0m10.534s
user	0m1.004s
sys	0m9.529s
b.cpp:6:2: warning: #warning allocating big blocks of all sizes
b.cpp:40:2: warning: #warning using posix_memalign

real	0m10.092s
user	0m1.156s
sys	0m8.937s
b.cpp:6:2: warning: #warning allocating big blocks of all sizes
b.cpp:34:2: warning: #warning using _mm_malloc

real	0m11.902s
user	0m1.080s
sys	0m10.825s
b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16
b.cpp:28:2: warning: #warning using handmade alloc

real	0m9.877s
user	0m0.968s
sys	0m8.909s
b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16
b.cpp:40:2: warning: #warning using posix_memalign

real	0m10.417s
user	0m1.104s
sys	0m9.313s
b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16
b.cpp:34:2: warning: #warning using _mm_malloc

real	0m11.194s
user	0m1.100s
sys	0m10.097s
b.cpp:16:2: warning: #warning allocating small blocks of all sizes
b.cpp:28:2: warning: #warning using handmade alloc

real	0m0.360s
user	0m0.356s
sys	0m0.004s
b.cpp:16:2: warning: #warning allocating small blocks of all sizes
b.cpp:40:2: warning: #warning using posix_memalign

real	0m0.562s
user	0m0.564s
sys	0m0.000s
b.cpp:16:2: warning: #warning allocating small blocks of all sizes
b.cpp:34:2: warning: #warning using _mm_malloc

real	0m0.590s
user	0m0.592s
sys	0m0.004s
b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16
b.cpp:28:2: warning: #warning using handmade alloc

real	0m0.303s
user	0m0.304s
sys	0m0.000s
b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16
b.cpp:40:2: warning: #warning using posix_memalign

real	0m0.607s
user	0m0.600s
sys	0m0.008s
b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16
b.cpp:34:2: warning: #warning using _mm_malloc

real	0m0.591s
user	0m0.588s
sys	0m0.004s
#include <Eigen/Core>

using namespace Eigen;

#ifdef EIGEN_BIGALLOC
#warning allocating big blocks of all sizes
#define EIGEN_SIZE(i) i
#endif

#ifdef EIGEN_BIGMULTIPLEALLOC
#warning allocating big blocks of sizes multiples of 16
#define EIGEN_SIZE(i) i&(~16)
#endif

#ifdef EIGEN_SMALLALLOC
#warning allocating small blocks of all sizes
#define EIGEN_SIZE(i) 1+(i&1023)
#endif

#ifdef EIGEN_SMALLMULTIPLEALLOC
#warning allocating small blocks of sizes multiples of 16
#define EIGEN_SIZE(i) (1+(i&1023))&(~16)
#endif



#ifdef EIGEN_HANDMADE
#warning using handmade alloc
#define EIGEN_ALLOC void *p = ei_handmade_aligned_malloc(EIGEN_SIZE(i));
#define EIGEN_FREE ei_handmade_aligned_free(p);
#endif

#ifdef EIGEN_MM
#warning using _mm_malloc
#define EIGEN_ALLOC void *p = _mm_malloc(EIGEN_SIZE(i),16);
#define EIGEN_FREE _mm_free(p);
#endif

#ifdef EIGEN_POSIX
#warning using posix_memalign
#define EIGEN_ALLOC void *p; posix_memalign(&p, 16, EIGEN_SIZE(i));
#define EIGEN_FREE free(p);
#endif

#ifdef EIGEN_STACK
#define EIGEN_ALLOC void *p = ei_aligned_stack_alloc(EIGEN_SIZE(i));
#define EIGEN_FREE ei_aligned_stack_free(p,i);
#endif


int main()
{
  for(int i = 1; i < 2000000; i++)
  {
    EIGEN_ALLOC
    EIGEN_FREE
  }
}

Attachment: bench.sh
Description: Bourne shell script



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/