Re: [AD] blit16 mmx end!! |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
JOSE ANTONIO LUQUE <skylord@xxxxxxxxxx> writes:
> I've do some code for iblit16.s, (_linear_blit16 and _masked_blit16).
> I've tested the code in DJGPP and MSVC and it seems work fine.
Wow, that's a spectacular improvement! Well done...
It would be great if you wanted to do the same thing for 8 and 32 bit
modes (the masked blit method obviously won't help in 32 bit, but the
blitting code might still be useful there).
One slight problem is that the mouse display code (which runs inside a
timer handler) calls blit() to display the cursor, but the interrupt
handlers don't save the FPU state, so it isn't safe to use MMX code
inside the handler. I fixed this by temporarily clearing cpu_mmx in the
mouse display code: it's ugly, but works.
> * Why DJGPP allegro test is more fast than MSVC allegro test?
DOS programs always seem to run a bit faster than the same thing under
Windows. In my case this is partly because my version of MSVC doesn't
optimise as well as djgpp, but even if you have the full optimising
compiler, the DirectDraw surface locking takes a while, plus Windows just
seems to have more general overhead, I don't know exactly what. It can
often more than make up for that difference by having better hardware
drivers, but if all other things are equal (like you are running the same
software drawing code on both, or comparing hardware accelerated
DirectDraw with VBE/AF on DOS), there's always a slight performance edge
to the DOS code.
> * Then why my dobble buffer reports the same values for two versions?
Probably because it's limited by bus bandwidth rather than processor
speed. Main memory (as soon as you access something not in the cache) is
many times slower than the processor, and video memory is even slower
than that. When drawing a smallish graphic like in the test program there
is a good chance that it will be at least reading (and maybe writing as
well) entirely through the cache, but when you blit an entire screen, it
falls back on the speed of raw memory accesses, which don't much care how
cleverly the code is written. This makes it almost pointless to optimise
the blit() for double buffering: even a simple loop copying single bytes
in C can probably keep up with the video memory speed, and beyond that,
no matter how much faster you make the CPU code won't do anything to
speed up the vram. The end result is that optimisations can make a big
difference for smaller drawing operations (blitting a medium to small
image), and for operations where the CPU load is more significant (like
masked drawing), but don't help at all for brute-force copying of larger
areas.
--
Shawn Hargreaves - shawn@xxxxxxxxxx - http://www.talula.demon.co.uk/
"A binary is barely software: it's more like hardware on a floppy disk."