Re: [AD] blit16 mmx end!!

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


JOSE ANTONIO LUQUE <skylord@xxxxxxxxxx> writes:
> I've do some code for iblit16.s, (_linear_blit16 and _masked_blit16).
> I've tested the code in DJGPP and MSVC and it seems work fine. 

Wow, that's a spectacular improvement! Well done...

It would be great if you wanted to do the same thing for 8 and 32 bit 
modes (the masked blit method obviously won't help in 32 bit, but the 
blitting code might still be useful there).

One slight problem is that the mouse display code (which runs inside a 
timer handler) calls blit() to display the cursor, but the interrupt 
handlers don't save the FPU state, so it isn't safe to use MMX code 
inside the handler. I fixed this by temporarily clearing cpu_mmx in the 
mouse display code: it's ugly, but works.

> * Why DJGPP allegro test is more fast than MSVC allegro test?

DOS programs always seem to run a bit faster than the same thing under 
Windows. In my case this is partly because my version of MSVC doesn't 
optimise as well as djgpp, but even if you have the full optimising 
compiler, the DirectDraw surface locking takes a while, plus Windows just 
seems to have more general overhead, I don't know exactly what. It can 
often more than make up for that difference by having better hardware 
drivers, but if all other things are equal (like you are running the same 
software drawing code on both, or comparing hardware accelerated 
DirectDraw with VBE/AF on DOS), there's always a slight performance edge 
to the DOS code.

> * Then why my dobble buffer reports the same values for two versions?

Probably because it's limited by bus bandwidth rather than processor 
speed. Main memory (as soon as you access something not in the cache) is 
many times slower than the processor, and video memory is even slower 
than that. When drawing a smallish graphic like in the test program there 
is a good chance that it will be at least reading (and maybe writing as 
well) entirely through the cache, but when you blit an entire screen, it 
falls back on the speed of raw memory accesses, which don't much care how 
cleverly the code is written. This makes it almost pointless to optimise 
the blit() for double buffering: even a simple loop copying single bytes 
in C can probably keep up with the video memory speed, and beyond that, 
no matter how much faster you make the CPU code won't do anything to 
speed up the vram. The end result is that optimisations can make a big 
difference for smaller drawing operations (blitting a medium to small 
image), and for operations where the CPU load is more significant (like 
masked drawing), but don't help at all for brute-force copying of larger 
areas.


-- 
Shawn Hargreaves - shawn@xxxxxxxxxx - http://www.talula.demon.co.uk/
"A binary is barely software: it's more like hardware on a floppy disk."



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/