Re: [AD] Color convertors

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


> Bob, I did also some preliminary testing with your optimizations: some are
> worth implementing, some aren't. We could further discuss them after the
> feature freeze.

I've applied the two instruction-length optimizations you suggested:
- xorl %reg,%reg instead of movl $0, %reg
You were right, xor is fully pairable even on P1 so that this is a valid
optimization in most cases on P1. However, as xor modifies the flags
contrary to mov, it is not usable in some cases like:
    decl %edx
    movl $0, %eax
    jnz loop
I'm not sure that this optimization is still valid on newer processors that
have got more than 2 pipelines, as xor %reg,%reg adds a fake read dependency
on %reg.

- %ecx instead of %edx in inner loops
Valid in all cases.


I've also done some testing on my old P200 and this time I was right:
shl/shr are not fully pairable on P1 (only in the U pipe):

_timed_func1:

   movl $10000, %ecx

   .balign 4, 0x90
   loop:
       shll $8, %eax
       addl $4, %ebx

       decl %ecx
       jnz loop

   ret

takes ~20050 cycles to complete while

_timed_func2:

   movl $10000, %ecx

   .balign 4, 0x90
   loop:
       addl $4, %ebx
       shll $8, %eax

       decl %ecx
       jnz loop

   ret

takes ~30050 cycles to complete.


Of course nowadays, with those modern out-of-order processors, these
problems have gone away...

--
Eric Botcazou
ebotcazou@xxxxxxxxxx



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/