RE: [AD] minor issues

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Our blit loop and clear loop overhead is horrible: Each iteration takes
up dozens of cycles to set itself up, not counting the virtual function
call in to set up the line pointers for video bitmaps (which, btw,
happens once per scanline).

If you tried with a really wide and narrow bitmap (1 million by 1 pixel,
say), the speeds would be mostly identical, bound only by the memory
bus.

On small bitmaps (that fit in the cache), the MMX code would be a win if
and only if that useless loop overhead could be eliminated.

My vote is to:

- special case memory bitmaps and use a much simpler assembly code, or
even memset/memcpy
- special case the non-VESA-or-MOdeX video bitmaps to lock the whole
region at a time and save that indirect function call.

All other cases can use the old "slow" path.


-----Original Message-----
From: alleg-developers-admin@xxxxxxxxxx
[mailto:alleg-developers-admin@xxxxxxxxxx] On Behalf Of Elias
Pschernig
Sent: Thursday, January 27, 2005 7:12 AM
To: alleg-developers
Subject: Re: [AD] minor issues

On Thu, 2005-01-27 at 12:50 +0100, Elias Pschernig wrote:

> Thanks! I'll try later.
> 

Ok, here's my results (output of 3 runs, formatted with the attached
script):

15Bits:  0.282..0.321  ->  0.244..0.257  (115..124%)
16Bits:  0.281..0.319  ->  0.244..0.256  (115..124%)
24Bits:  0.641..0.678  ->  0.443..0.474  (144..143%)
32Bits:  0.721..0.721  ->  0.687..0.699  (104..103%)

Just like in your case, for 32bit, the difference is very small (may
well lie within inaccuracy of timing on my system, with just 3 runs).
For 24bit it is highest, but 24bit is the least useful format.

I'd really like to see an analysis of why the libc implementation is
faster than the MMX asm we use currently. I guess it's just the overhead
of looping over the line pointers. Doesn't matter though, it's faster,
so we should use it.

I also used linux and P4 - would be intersting to see other results
(AMD, or P3 or something) before applying.

Then, as Peter said, there's the problem if we can assume that lines are
continuous. If we make this change, then we should specify that
is_memory_bitmap(bmp) means that there can be no line gaps (and also
check that no other memory bitmaps are currently created by Allegro).

Oh, and do you think something similiar can be done for clear_to_color?
That's what really would bring an improvement to actual programs, since
normally you need to clear to some color, not just 0.

-- 
Elias Pschernig




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/