Re: [AD] Color convertors |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
Gaah! Mozilla deleted my e-mail. Ok, I have to write it again :/
Eric Botcazou wrote:
I can't believe it ;-) You swapped %ecx and %edx in the whole non-MMX code !
What for ?
Many opcodes are smaller if they use %ecx (shifts and (AFAIK) decl) or %eax
(just about everything). decl %ecx/jnz is also a combination the CPU can
optimize internally.
The same about movl vs xorl; smaller opcodes, so there's less pressure on
the instruction cache, decoders and memory bandwidth. Esp since the 686
decoders can only work on 16 bytes at a time (and you need to fit 3 or 4
instructions in there).
IMO it's not the smartest thing to do:
- it's obviously error-prone: the crash in the 32->24 code comes from there,
The *->24 bit code should have been tested (I didn't do it though). I guess
the tests weren't properly done then. Sorry about that.
- the diff file is literally illegible,
I know =)
- in several places the Pentium pairing is broken, I want to fix it but now
my own material is useless.
It shouldn't be too hard.
%ecx <=> %edx
%esi <=> %eax
%edi <=> %ebx
That's almost all that that has changed in the main loops.
[snip]
I'm going to partially revert the changes you made to the non-MMX code:
- the %ecx vs %edx swap,
AFAIK only the *->24 bit exhibits the bug, but that's because it wasn't
properly tested (finding someone knowledgeable and who can set a 24bpp
display is not as easy as it sounds - of course, you seem to be able to do it :)
- the xorl vs movl changes,
Like I said, xorl reg, reg is completely interchangeable with movl 0, reg,
except xorl is shorter, but sets the flags (which may not be wanted).
- the modifications in the 8->24 code I don't understand and that break the
pairing,
AFAIK, you only need to move addl $4, %esi three lines up to after the
lookup movl. Sorry I didn't double check the pairings. I'll go back over
them during this week.
Moreover, I see you did the same thing for the MMX code: the permutation
between registers is even more complicated ! What for ?
See above.
--
- Robert J Ohannessian
"Microsoft code is probably O(n^20)" (my CS prof)
http://pages.infinit.net/voidstar/