| Re: [AD] Bug in Allegro's color convertors? | 
[ Thread Index | 
Date Index
| More lists.liballeg.org/allegro-developers Archives
] 
> Done. I make some changes to the MMX code to get better timing on the
> i686. I shaved another 2 cycles out of it.
Here are the new results (K6-2/333):
Comparing test profile logs 32to24_MMX.log and 32to24-3_MMX.log
DRAW_MODE_SOLID results:
 putpixel()                       = 103%
 hline()                          = 100%
 vline()                          = 103%
 line()                           = 106%
 rectfill()                       = 103%
 circle()                         = 103%
 circlefill()                     = 102%
 ellipse()                        = 104%
 ellipsefill()                    = 105%
 arc()                            = 104%
 triangle()                       = 101%
Other functions:
 textout()                        = 103%
 vram->vram blit()                = 104%
 aligned vram->vram blit()        = 104%
 blit() from memory               = 105%
 aligned blit() from memory       = 103%
 vram->vram masked_blit()         = N/A
 masked_blit() from memory        = 103%
 draw_sprite()                    = 104%
 draw_rle_sprite()                = 104%
 draw_compiled_sprite()           = 104%
 draw_trans_sprite()              = 103%
 draw_trans_rle_sprite()          = 104%
 draw_lit_sprite()                = 103%
 draw_lit_rle_sprite()            = 103%
Not a single loss anymore. Well done!
> I wasn't able to do anything for the non-MMX code however. It's very tight
> considering the Pentium pairing rules.
Yes, I don't think we can do much better. Did you use a simulator or
anything like that to schedule the instructions ? Your code was fully
pairable right out of the box (I usually make mistakes related to AGI or
cache lines so I need to see the real cycle count).
> What's surprising is the lack of coherency in between the various
> functions. For example, circlefill is slower, but hline has the same
> speed. Same for putpixel vs circle. This is probably due to random
> noise in the system (Windows).
I think that's due to cache misses/memory timing, which are clearly the
bottleneck here.
> I wouldn't worry too much about it, especially since this is the worst
> combination of color depths, speedwise.
Agreed. Entire patch applied to trunk and branch.
--
Eric Botcazou
ebotcazou@xxxxxxxxxx