Re: [AD] Asm optimized version of _linear_draw_trans_rgba_rle_sprite24 is buggy?

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Evert Glebbeek wrote:

I see, I'll try to fix it myself, though I'm not familiar with AT&T
assembly style.

Brave man!

A patch fixing this bug is attached. Looks like it was just a typo in
the code, found it after line by line comparing functions
_linear_draw_trans_rgba_rle_sprite24 and _linear_draw_trans_rle_sprite24

Macro preprocessor is an evil thing :)

As I see, asm code uses segments prefixes extensively (es: segment for
example). On modern CPUs, the instructions using segment override
prefixes suffer performance penalty,

Aha, that might be why the performance appears to have gone down recently.

Looked through allegro code, maybe I missed something, but looks like
only DJGPP and VBE/AF driver really use these 'seg' member from
BITMAP structure for accessing video memory. All the other ports working
on x86 hardware initialize it with ds: segment value.

On the other hand, I experimented with removing all the use of es:
segment from RLE sprite drawing function but it did not improve
performance at all. Compiler seems to do a better job anyway.

Also I tried to investigate what prevents asm version of allegro from
running in valgrind. Looks like valgrind gets stuck in various places.
Version 2.2.0 reports 'invalid instruction' on memory access using es:
segment prefix. Version 3.0.1 complains about 'lahf' instruction.
Version 2.4.1 starts and dies on 'stretch_blit' probably because of
dynamically generated code. After replacing 'stretch_blit' with C
version, valgrind stucks on MMX code in 'clear_to_color'. After
disabling MMX and SSE, it begins to die inside of  'putpixel' function
with no visible reason. Gave up after that...

diff -r -u allegro-4.2.0/src/i386/ispr24.s allegro-4.2.0-fixed/src/i386/ispr24.s
--- allegro-4.2.0/src/i386/ispr24.s	Wed Oct  2 21:29:58 2002
+++ allegro-4.2.0-fixed/src/i386/ispr24.s	Sun Nov 13 19:13:16 2005
@@ -830,7 +830,7 @@
       WRITE_BANK()                  /* select write bank */                ; \
       movl R_X, %edi                                                       ; \
       leal (%edi, %edi, 2), %edi                                           ; \
-      addl %eax, %edi
+      addl %eax, %edi                                                      ; \
       subl %eax, R_TMP              /* calculate read/write diff */
 
 




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/