Re: [AD] Color convertors |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
> Bob, I did also some preliminary testing with your optimizations: some are
> worth implementing, some aren't. We could further discuss them after the
> feature freeze.
I've applied the two instruction-length optimizations you suggested:
- xorl %reg,%reg instead of movl $0, %reg
You were right, xor is fully pairable even on P1 so that this is a valid
optimization in most cases on P1. However, as xor modifies the flags
contrary to mov, it is not usable in some cases like:
decl %edx
movl $0, %eax
jnz loop
I'm not sure that this optimization is still valid on newer processors that
have got more than 2 pipelines, as xor %reg,%reg adds a fake read dependency
on %reg.
- %ecx instead of %edx in inner loops
Valid in all cases.
I've also done some testing on my old P200 and this time I was right:
shl/shr are not fully pairable on P1 (only in the U pipe):
_timed_func1:
movl $10000, %ecx
.balign 4, 0x90
loop:
shll $8, %eax
addl $4, %ebx
decl %ecx
jnz loop
ret
takes ~20050 cycles to complete while
_timed_func2:
movl $10000, %ecx
.balign 4, 0x90
loop:
addl $4, %ebx
shll $8, %eax
decl %ecx
jnz loop
ret
takes ~30050 cycles to complete.
Of course nowadays, with those modern out-of-order processors, these
problems have gone away...
--
Eric Botcazou
ebotcazou@xxxxxxxxxx