> Thanks! I'll try later.
>
Ok, here's my results (output of 3 runs, formatted with the attached
script):
15Bits: 0.282..0.321 -> 0.244..0.257 (115..124%)
16Bits: 0.281..0.319 -> 0.244..0.256 (115..124%)
24Bits: 0.641..0.678 -> 0.443..0.474 (144..143%)
32Bits: 0.721..0.721 -> 0.687..0.699 (104..103%)
Just like in your case, for 32bit, the difference is very small (may
well lie within inaccuracy of timing on my system, with just 3 runs).
For 24bit it is highest, but 24bit is the least useful format.
I'd really like to see an analysis of why the libc implementation is
faster than the MMX asm we use currently. I guess it's just the overhead
of looping over the line pointers. Doesn't matter though, it's faster,
so we should use it.
I also used linux and P4 - would be intersting to see other results
(AMD, or P3 or something) before applying.
Then, as Peter said, there's the problem if we can assume that lines are
continuous. If we make this change, then we should specify that
is_memory_bitmap(bmp) means that there can be no line gaps (and also
check that no other memory bitmaps are currently created by Allegro).
Oh, and do you think something similiar can be done for clear_to_color?
That's what really would bring an improvement to actual programs, since
normally you need to clear to some color, not just 0.
--
Elias Pschernig