Thanks a lot for getting this data! Very interesting, as I expected the overhead of handmade_ to be larger. Also, you make a good point about the lack of a realloc equivalent for posix_memalign.
So I understand that this makes handmade_ a compelling solution.
On the other hand, I am still concern about the dangerousness of MALLOC_ALREADY_ALIGNED. But it seems that the C++11 rule that you found, could make it safe at least when compiling in C++11 mode, which these days is probably the majority of users (data point: both Google and Mozilla build all their C++ in C++11 mode these days).