Re: [AD] Color convertors

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


> Many opcodes are smaller if they use %ecx (shifts and (AFAIK) decl) or
> %eax (just about everything).

Ok, I wasn't aware of that.

> decl %ecx/jnz is also a combination the CPU can optimize internally.

decl %edx/jnz takes exactly one cycle on Pentium when the branch
prediction is correct, I don't think it can do much better.

> The same about movl vs xorl; smaller opcodes, so there's less pressure on
> the instruction cache, decoders and memory bandwidth.

We already talked about the xor vs mov optimization. I'll really check my
docs before further arguing.

> > - the diff file is literally illegible,
>
> I know =)

I think the solution with this kind of complicated changes is clearly to
split the diff file: a first diff file could show the registers twist and
some related things only, while a second diff file could show the new code.

> It shouldn't be too hard.
> %ecx <=> %edx
> %esi <=> %eax
> %edi <=> %ebx
>
> That's almost all that that has changed in the main loops.

Yes, but given that I have one file per routine, that I have to do at least
two swap operations per routine, etc

> > I'm going to partially revert the changes you made to the non-MMX code:

I postpone my modifications until we have a clear view on the situation.

> AFAIK, you only need to move addl $4, %esi three lines up to after the
> lookup movl. Sorry I didn't double check the pairings. I'll go back over
> them during this week.

You replaced one 'andl $FF000000' with two 'shll': that's fatal on Pentium
for the pairing because 'shll' is not fully pairable (it can only go into
the U pipe).


This was for the non-MMX code, let's continue with the MMX code... (next
message).

---
Eric Botcazou
ebotcazou@xxxxxxxxxx



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/