Re: [AD] Color convertors

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Here are some results for the MMX code:

[Warning: tons of numbers !!!]

Graphics driver: DirectDraw window
Description: DirectDraw, in color conversion, 16 bpp window

Screen size: 640x480
Virtual screen size: 640x480
Color depth: 32 bpp

SOLID mode:

       step0  step1  step2  step3  step4  step5
 put - 12230  12336  12198  12144  12056  12393
 hli - 30665  31115  30603  30470  30319  31234
 vli - 23041  23326  22899  22813  22767  23123
 lin - 417    414    413    423    417    423
 rec - 2691   2727   2664   2591   2682   2762
 cir - 277    285    280    276    276    285
 cir - 564    578    561    566    564    577
 ell - 277    284    279    282    274    283
 ell - 577    591    572    567    571    581
 arc - 498    496    471    476    477    501
 tri - 584    588    589    589    587    587

Other functions:

 tex - 1947   1976   1952   1966   1946   1988
 vra - 4657   4702   4621   4540   4621   4682
 ali - 4859   4911   4841   4735   4821   4901
 bli - 565    582    565    568    565    583
 ali - 561    574    561    564    566    577
 vra - 4306   4357   4315   4224   4291   4344
 mas - 564    574    559    562    560    578
 dra - 555    565    553    558    553    566
 dra - 562    568    561    562    557    567
 dra - 559    569    559    559    557    570
 dra - 543    548    541    542    539    550
 dra - 549    554    547    546    543    557
 dra - 543    551    541    540    537    553
 dra - 547    551    544    547    545    554


step0: original Isaac's code
step0->step1: added _align_ at jump points
step1->step2: register permutation
step2->step3: replaced incl %ebp        by  decl %ecx
                       cmpl %ebp, %ecx      jnz
                       jb ...
step3->step4: removed the three 'nop'
step4->step5: replaced '%ebp' by '%mm7'

Final state of the code:

   _align_
   next_line_32_to_16:
      movd %mm7, %ecx

      _align_
      next_block_32_to_16:
         movq (%esi), %mm0
         movq %mm0, %mm1
         movq %mm0, %mm2
         PAND (5, 0)        /* pand %mm5, %mm0 */
         psrld $3, %mm0
         PAND (3, 1)        /* pand %mm3, %mm1 */
         psrld $5, %mm1
         por %mm1, %mm0
         addl $8, %esi
         PAND (4, 2)        /* pand %mm4, %mm2 */
         psrld $8, %mm2
         por %mm2, %mm0
         movq %mm0, %mm6
         psrlq $16, %mm0
         por %mm0, %mm6
         movd %mm6, (%edi)
         addl $4, %edi

         decl %ecx
         jnz next_block_32_to_16

      addl %eax, %esi
      addl %ebx, %edi
      decl %edx
      jnz next_line_32_to_16


That's basically your code, isn't it (modulo some line swaps) ?
I'm not sure the variations are very significative but at least in one run
there is an improvement on my machine and, as it exists on yours too, I'm ok
with the changes.

Here are my plans:
- revert the hideous big patch that was applied to the conversion code,
- first fix some formatting issues,
- fix the bug in the three non-MMX routines,
- gradually apply your modifications to the MMX routines (step0->step5),
- add your two new MMX routines,
- add the new code to support the non multiple of 4 widths; the Windows port
won't use it for the time being because of the alignment issue so it will be
#ifndef'ed,
- add the remaining stuff needed by the BeOS port (#ifndef'ed too for
Windows).

This will temporarily break the BeOS port but I'll do it as quickly as
possible (some hours I think). Note that I won't apply your optimizations to
the non-MMX code until I have evaluated their impact on the Pentium I.

If you want to make additional tweaks to the MMX code, you'll be able to do
it after that.

---
Eric Botcazou
ebotcazou@xxxxxxxxxx



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/