Elias Pschernig wrote:
> I"d really like to see an analysis of why the libc implementation is
> faster than the MMX asm we use currently.

I can't say, but here is libc i386 implementation:

I run the test also on XP on the same machine. The improvement is even more notable:
15&16-bit: 1.0705/1.672=0.6429
24-bit: 2.086/32.398=0.87
32-bit: 3.07/3.1955=0.96

Note that I get the highest difference with 15&16 bit, not 24. How can you explain that?

I forgot to put 8-bit test within, but it behaves like 32-bit: mekes no difference.

Milan Mimica

