[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
At 01:19 2000-01-05 -0500, you wrote:
>
>Ok, here we go again:
>
>I've managed to make clear work with MMX.
>
>Now for the benchmarks (all in clean DOS), running test's "time some stuff"
>
>On memory bitmaps, I get:
>
>on my PII 300 -> 337, 32MB SDRAM (100MHz), 75Mhz FSB, Matrox Mystique
>I get a (sigh) 0.8% decrease in performance of clear()
Which means that you haven't optimised it enough. Alignment is a dangerous
pitfall, have you aligned everything properly? And have you unrolled enough?
>on my Celery 300a -> 450, 128MB SDRAM (100MHz), 100MHz FSB, RivaTNT
>(Diamond ViperV550),
>I get a 10.8 % increase in performance.
Which shows that moving one quadword at a time IS more optimal.
>That's not quite what I expected. I'll see if I can optimize the routine
>some more.
>
>But first:
>
>I can't test it on video bitmaps because for some reason screen->seg is
>invalid. The MMX routine locks up the computer when I try to clear a video
>bitmap (in VESA2L), or does absolutely nothing (but still uses the cpu
>cycles it should (???)) in VESA1 or VESA2B.
>Either the region isn't memory mapped (but then, how does the rest of
>Allegro work ?) or you can't use MMX's movq on video memory.
>If the later, than a lot of the benefits of using MMX in the first place
>are gone.
How does the code look currently? Are you using %es? (sounds like this is a
problem which could appear if %es wasn't used).
It could be good in any case to do two different versions, one for memory
bitmaps and one for the screen (to avoid the es prefix decoding).
Erik