Re: [AD] Using memmove in blit()?

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Siarhei Siamashka wrote:

On Monday 30 January 2006 23:49, Evert Glebbeek wrote:
On Sunday 22 January 2006 21:03, George Foot wrote:
On the ARM architecture (GP2X), memcpy is about a million times
faster than Allegro's normal blit (ok, maybe only about 2 times
faster...). This is for screen-to-screen blitting, which is just
a normal linear bitmap but in uncached memory, and accessed
through mmapping.
It seems that on the Mac, the memcpy version is also noticably
faster [http://www.allegro.cc/forums/thread/562923#target]. So as
far as I can see that brings the score to `some systems benefit a
lot' and `some systems don't care at all.' Anyone opposed to
applying (a cleaned up version of) the patch?

Apparently no one objected, but I'm reattaching the patch anyway
before I apply it.

Serge Semashko: please check how ths patch fares on your Nokia. Does
it offer a speed increase there too?

Yes, it surely increases speed comparing to current straightforward
implementation in allegro. But memmove is not the best option and there
is a way to improve the performance even more using ARM optimized asm,
and I'm actually working on this patch as I have announced here:
http://www.allegro.cc/forums/thread/571721

There are the issues open in the quoted block: what about bmp_readXX()
and bmp_writeXX() macro on DJGPP and probably other platforms?

By the way, I have already done that optimized function and submitted
the code to Nokia 770 developers mailing list:
http://maemo.org/pipermail/maemo-developers/2006-March/003269.html

Could anybody test these optimized functions on GP2X as well?

Now I'm working on integration of this code into allegro and initial
benchmarks are very good (ufo2000 FPS is boosted to 20 now, which is
much closer to comfortable play).

One more issue with memmove is that it degrades performance for blitting
very small bitmaps because of function call overhead.

More news here, optimal memcpy implementation heavily depends on cpu
architecture:
http://maemo.org/pipermail/maemo-developers/2006-March/003373.html

So I guess it would be a good idea to use memmove patch and rely on
target system implementation. If this implementation is not good, it is
better to submit a bugreport asking to improve it, and it has nothing to do with allegro.

On the other hand, there are good results for optimized memset function
and it seems to be good for any cpu. Also considering that standard
memset is useless for clearing/filling 16bpp and 32bpp bitmaps, having
inline assembler implementation in allegro makes sense.

Could anybody clarify current memmove patch status?





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/