Re: [AD] Using memmove in blit()?

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Evert Glebbeek wrote:

On Monday 30 January 2006 23:49, Evert Glebbeek wrote:
On Sunday 22 January 2006 21:03, George Foot wrote:
On the ARM architecture (GP2X), memcpy is about a million times
faster than Allegro's normal blit (ok, maybe only about 2 times
faster...). This is for screen-to-screen blitting, which is just
a normal linear bitmap but in uncached memory, and accessed
through mmapping.
It seems that on the Mac, the memcpy version is also noticably
faster [http://www.allegro.cc/forums/thread/562923#target]. So as
far as I can see that brings the score to `some systems benefit a
lot' and `some systems don't care at all.' Anyone opposed to
applying (a cleaned up version of) the patch?

Apparently no one objected, but I'm reattaching the patch anyway
before I apply it.

Serge Semashko: please check how ths patch fares on your Nokia. Does
it offer a speed increase there too?

Yes, it surely increases speed comparing to current straightforward
implementation in allegro. But memmove is not the best option and there
is a way to improve the performance even more using ARM optimized asm,
and I'm actually working on this patch as I have announced here:
http://www.allegro.cc/forums/thread/571721

[cut]
Fine, I'm ok with keeping my own patchset right now. More changes are to
follow. I have written an ARM optimized version of memset function (with
16-bit and 32-bit variants), it beats standard memset on Nokia 770
providing twice better bandwidth! And I started playing with these
optimizations because standard _linear_clear_to_color() is too slow in
allegro (several times slower than standard memset) because of
initializing data one byte at a time for 8bpp modes and 16-bits at a
time for 16bpp modes. Using ARM assembler optimized function it is
possible to boost clear_to_color performance many times! Now I'm working
on optimized version of memcpy(), it should speed up _linear_blit()
significantly :) If anybody is interested to have a copy of my code,
just let me know.

In order to optimize that blit and clear_to_color function it is
important to have direct access to bitmap memory, bmp_readXX() and
bmp_writeXX() are inefficient on ARM (they are OK on x86 by the way as
x86 probably has more advanced cache). The only architecture which has
them implemented as not a simple pointer dereferencing macro is DJGPP.
Maybe it is worth to have some define like NEED_NONTRIVIAL_BMP_READ for
such platforms and make it possible to use some optimized variants of
blitting functions which do not need these bmp_readXX() and
bmp_writeXX() but use something like memset/memcpy instead?
[/cut]

There are the issues open in the quoted block: what about bmp_readXX()
and bmp_writeXX() macro on DJGPP and probably other platforms?

By the way, I have already done that optimized function and submitted
the code to Nokia 770 developers mailing list:
http://maemo.org/pipermail/maemo-developers/2006-March/003269.html

Could anybody test these optimized functions on GP2X as well?

Now I'm working on integration of this code into allegro and initial
benchmarks are very good (ufo2000 FPS is boosted to 20 now, which is
much closer to comfortable play).

One more issue with memmove is that it degrades performance for blitting
very small bitmaps because of function call overhead.

PS. I registered gmail account, hopefully it will work better with
allegro mailing list, will see it after posting this message :)






Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/