Re: [AD] Color convertors |
[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]
Eric Botcazou wrote: [snip - tons of numbers !!!]
step0 step1 step2 step3 step4 step5 put - 12230 12336 12198 12144 12056 12393 hli - 30665 31115 30603 30470 30319 31234 vli - 23041 23326 22899 22813 22767 23123
How come the register permutation gave such a slow down? step2 should have been better than step 1 due to smaller code, or at worst, it should give the same numbers (within a reasonable range).
Same for step2->step3, step 3 should be faster (one less instruction!). The numbers are within the 1% margin of error though. The same can be said about step 4, which has less isntructions (no more stalling nops, smaller code).
And finally, step 5 is the one -least- likely to have any bearing on speed, yet it shows to be faster than all the rest. I'm at a loss to explain it :/
Final state of the code:
[snip - code]Yes, this is what my code looks like (minus some instruction swapping, and the non-width-of-4 code).
Here are my plans: - revert the hideous big patch that was applied to the conversion code, - first fix some formatting issues, - fix the bug in the three non-MMX routines, - gradually apply your modifications to the MMX routines (step0->step5), - add your two new MMX routines, - add the new code to support the non multiple of 4 widths; the Windows port won't use it for the time being because of the alignment issue so it will be #ifndef'ed, - add the remaining stuff needed by the BeOS port (#ifndef'ed too for Windows).
Ok, it'll give me time to refine the steps too and see if I can come up with something better.
-- - Robert J Ohannessian "Microsoft code is probably O(n^20)" (my CS prof) http://pages.infinit.net/voidstar/
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |