Re: [AD] Color convertors |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
> Many opcodes are smaller if they use %ecx (shifts and (AFAIK) decl) or
> %eax (just about everything).
Ok, I wasn't aware of that.
> decl %ecx/jnz is also a combination the CPU can optimize internally.
decl %edx/jnz takes exactly one cycle on Pentium when the branch
prediction is correct, I don't think it can do much better.
> The same about movl vs xorl; smaller opcodes, so there's less pressure on
> the instruction cache, decoders and memory bandwidth.
We already talked about the xor vs mov optimization. I'll really check my
docs before further arguing.
> > - the diff file is literally illegible,
>
> I know =)
I think the solution with this kind of complicated changes is clearly to
split the diff file: a first diff file could show the registers twist and
some related things only, while a second diff file could show the new code.
> It shouldn't be too hard.
> %ecx <=> %edx
> %esi <=> %eax
> %edi <=> %ebx
>
> That's almost all that that has changed in the main loops.
Yes, but given that I have one file per routine, that I have to do at least
two swap operations per routine, etc
> > I'm going to partially revert the changes you made to the non-MMX code:
I postpone my modifications until we have a clear view on the situation.
> AFAIK, you only need to move addl $4, %esi three lines up to after the
> lookup movl. Sorry I didn't double check the pairings. I'll go back over
> them during this week.
You replaced one 'andl $FF000000' with two 'shll': that's fatal on Pentium
for the pairing because 'shll' is not fully pairable (it can only go into
the U pipe).
This was for the non-MMX code, let's continue with the MMX code... (next
message).
---
Eric Botcazou
ebotcazou@xxxxxxxxxx