Re: [eigen] Re: platform detection for aligned malloc |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
So, I've benchmarked all three aligned malloc methods available on my system: my handmade func, posix_memalign, _mm_malloc. Executive summary: * i logged off kde 4.2 to run the benchmark but i didn't expect that it would be so useful: typical times went from 17 sec (run from kde session, all nontrivial apps closed) down to 10 sec (logged off kde, run from vt) !!! But the idle CPU usage of my kde session is negligible, so I don't understand. * all methods are a few percent faster with sizes that are multiples of 16 bytes, than with sizes that can be anything * all methods are much faster for small sizes than for large sizes * for large sizes, all three methods have the same speed (could not get any meaningful difference) * for small sizes (less than 1 kB) my handmade function is almost 2x faster than the others! Can be explained by the fact that it's hardcoded for 16-byte alignment. b.cpp:6:2: warning: #warning allocating big blocks of all sizes b.cpp:28:2: warning: #warning using handmade alloc real 0m10.534s user 0m1.004s sys 0m9.529s b.cpp:6:2: warning: #warning allocating big blocks of all sizes b.cpp:40:2: warning: #warning using posix_memalign real 0m10.092s user 0m1.156s sys 0m8.937s b.cpp:6:2: warning: #warning allocating big blocks of all sizes b.cpp:34:2: warning: #warning using _mm_malloc real 0m11.902s user 0m1.080s sys 0m10.825s b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16 b.cpp:28:2: warning: #warning using handmade alloc real 0m9.877s user 0m0.968s sys 0m8.909s b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16 b.cpp:40:2: warning: #warning using posix_memalign real 0m10.417s user 0m1.104s sys 0m9.313s b.cpp:11:2: warning: #warning allocating big blocks of sizes multiples of 16 b.cpp:34:2: warning: #warning using _mm_malloc real 0m11.194s user 0m1.100s sys 0m10.097s b.cpp:16:2: warning: #warning allocating small blocks of all sizes b.cpp:28:2: warning: #warning using handmade alloc real 0m0.360s user 0m0.356s sys 0m0.004s b.cpp:16:2: warning: #warning allocating small blocks of all sizes b.cpp:40:2: warning: #warning using posix_memalign real 0m0.562s user 0m0.564s sys 0m0.000s b.cpp:16:2: warning: #warning allocating small blocks of all sizes b.cpp:34:2: warning: #warning using _mm_malloc real 0m0.590s user 0m0.592s sys 0m0.004s b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16 b.cpp:28:2: warning: #warning using handmade alloc real 0m0.303s user 0m0.304s sys 0m0.000s b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16 b.cpp:40:2: warning: #warning using posix_memalign real 0m0.607s user 0m0.600s sys 0m0.008s b.cpp:21:2: warning: #warning allocating small blocks of sizes multiples of 16 b.cpp:34:2: warning: #warning using _mm_malloc real 0m0.591s user 0m0.588s sys 0m0.004s
#include <Eigen/Core> using namespace Eigen; #ifdef EIGEN_BIGALLOC #warning allocating big blocks of all sizes #define EIGEN_SIZE(i) i #endif #ifdef EIGEN_BIGMULTIPLEALLOC #warning allocating big blocks of sizes multiples of 16 #define EIGEN_SIZE(i) i&(~16) #endif #ifdef EIGEN_SMALLALLOC #warning allocating small blocks of all sizes #define EIGEN_SIZE(i) 1+(i&1023) #endif #ifdef EIGEN_SMALLMULTIPLEALLOC #warning allocating small blocks of sizes multiples of 16 #define EIGEN_SIZE(i) (1+(i&1023))&(~16) #endif #ifdef EIGEN_HANDMADE #warning using handmade alloc #define EIGEN_ALLOC void *p = ei_handmade_aligned_malloc(EIGEN_SIZE(i)); #define EIGEN_FREE ei_handmade_aligned_free(p); #endif #ifdef EIGEN_MM #warning using _mm_malloc #define EIGEN_ALLOC void *p = _mm_malloc(EIGEN_SIZE(i),16); #define EIGEN_FREE _mm_free(p); #endif #ifdef EIGEN_POSIX #warning using posix_memalign #define EIGEN_ALLOC void *p; posix_memalign(&p, 16, EIGEN_SIZE(i)); #define EIGEN_FREE free(p); #endif #ifdef EIGEN_STACK #define EIGEN_ALLOC void *p = ei_aligned_stack_alloc(EIGEN_SIZE(i)); #define EIGEN_FREE ei_aligned_stack_free(p,i); #endif int main() { for(int i = 1; i < 2000000; i++) { EIGEN_ALLOC EIGEN_FREE } }
Attachment:
bench.sh
Description: Bourne shell script
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |