[AD] masked_blit 106% faster !!

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]



... in 16bpp, if the blit is memory->memory, and if you have a Pentium 3 + system only.

It uses SSE (or Enhanced 3DNow!) to do the job. On my system (P3 800/133), I get a 106% improvement in memory->memory masked_blits, however, there's a 2.5% drop in memory->screen masked blits.

On Athlons however, it seems it cuts the speed in half (!) from the normal one. I can't confirm this since I don't have access to one directly (had to rely on non-techincal friend via icq...) so can someone back up those numbers?

If the 3dnow! code really is half as fast as the non-3dnow, I'll remove it from the patch.


The patch also adds Enhanced 3DNow! detection.

In case I screwed up somewhere, can the patch be tested on other systems? I only had the chance to do it an Athlon TBird C and a P3, and with only a limited amount of sprites.


--
- Robert J Ohannessian
"Microsoft code is probably O(n^20)" (my CS prof)
http://pages.infinit.net/voidstar/

Attachment: sse16.zip
Description: Zip compressed data



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/