Re: [AD] Color convertors

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Eric Botcazou wrote:
[snip - tons of numbers !!!]
       step0  step1  step2  step3  step4  step5
 put - 12230  12336  12198  12144  12056  12393
 hli - 30665  31115  30603  30470  30319  31234
 vli - 23041  23326  22899  22813  22767  23123


How come the register permutation gave such a slow down? step2 should have been better than step 1 due to smaller code, or at worst, it should give the same numbers (within a reasonable range).

Same for step2->step3, step 3 should be faster (one less instruction!). The numbers are within the 1% margin of error though. The same can be said about step 4, which has less isntructions (no more stalling nops, smaller code).

And finally, step 5 is the one -least- likely to have any bearing on speed, yet it shows to be faster than all the rest. I'm at a loss to explain it :/

Final state of the code:

[snip - code]

Yes, this is what my code looks like (minus some instruction swapping, and the non-width-of-4 code).

Here are my plans:
- revert the hideous big patch that was applied to the conversion code,
- first fix some formatting issues,
- fix the bug in the three non-MMX routines,
- gradually apply your modifications to the MMX routines (step0->step5),
- add your two new MMX routines,
- add the new code to support the non multiple of 4 widths; the Windows port
won't use it for the time being because of the alignment issue so it will be
#ifndef'ed,
- add the remaining stuff needed by the BeOS port (#ifndef'ed too for
Windows).


Ok, it'll give me time to refine the steps too and see if I can come up with something better.


--
- Robert J Ohannessian
"Microsoft code is probably O(n^20)" (my CS prof)
http://pages.infinit.net/voidstar/



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/