Re: [AD] Color convertors

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Gaah! Mozilla deleted my e-mail. Ok, I have to write it again :/

Eric Botcazou wrote:

I can't believe it ;-) You swapped %ecx and %edx in the whole non-MMX code !
What for ?


Many opcodes are smaller if they use %ecx (shifts and (AFAIK) decl) or %eax (just about everything). decl %ecx/jnz is also a combination the CPU can optimize internally. The same about movl vs xorl; smaller opcodes, so there's less pressure on the instruction cache, decoders and memory bandwidth. Esp since the 686 decoders can only work on 16 bytes at a time (and you need to fit 3 or 4 instructions in there).



IMO it's not the smartest thing to do:
- it's obviously error-prone: the crash in the 32->24 code comes from there,

The *->24 bit code should have been tested (I didn't do it though). I guess the tests weren't properly done then. Sorry about that.


- the diff file is literally illegible,

I know =)


- in several places the Pentium pairing is broken, I want to fix it but now
my own material is useless.

It shouldn't be too hard.
%ecx <=> %edx
%esi <=> %eax
%edi <=> %ebx

That's almost all that that has changed in the main loops.



[snip]
I'm going to partially revert the changes you made to the non-MMX code:
- the %ecx vs %edx swap,

AFAIK only the *->24 bit exhibits the bug, but that's because it wasn't properly tested (finding someone knowledgeable and who can set a 24bpp display is not as easy as it sounds - of course, you seem to be able to do it :)

- the xorl vs movl changes,

Like I said, xorl reg, reg is completely interchangeable with movl 0, reg, except xorl is shorter, but sets the flags (which may not be wanted).


- the modifications in the 8->24 code I don't understand and that break the
pairing,


AFAIK, you only need to move addl $4, %esi three lines up to after the lookup movl. Sorry I didn't double check the pairings. I'll go back over them during this week.



Moreover, I see you did the same thing for the MMX code: the permutation
between registers is even more complicated ! What for ?

See above.


--
- Robert J Ohannessian
"Microsoft code is probably O(n^20)" (my CS prof)
http://pages.infinit.net/voidstar/



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/