Re: [AD] Color convertors |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
Here are some results for the MMX code:
[Warning: tons of numbers !!!]
Graphics driver: DirectDraw window
Description: DirectDraw, in color conversion, 16 bpp window
Screen size: 640x480
Virtual screen size: 640x480
Color depth: 32 bpp
SOLID mode:
step0 step1 step2 step3 step4 step5
put - 12230 12336 12198 12144 12056 12393
hli - 30665 31115 30603 30470 30319 31234
vli - 23041 23326 22899 22813 22767 23123
lin - 417 414 413 423 417 423
rec - 2691 2727 2664 2591 2682 2762
cir - 277 285 280 276 276 285
cir - 564 578 561 566 564 577
ell - 277 284 279 282 274 283
ell - 577 591 572 567 571 581
arc - 498 496 471 476 477 501
tri - 584 588 589 589 587 587
Other functions:
tex - 1947 1976 1952 1966 1946 1988
vra - 4657 4702 4621 4540 4621 4682
ali - 4859 4911 4841 4735 4821 4901
bli - 565 582 565 568 565 583
ali - 561 574 561 564 566 577
vra - 4306 4357 4315 4224 4291 4344
mas - 564 574 559 562 560 578
dra - 555 565 553 558 553 566
dra - 562 568 561 562 557 567
dra - 559 569 559 559 557 570
dra - 543 548 541 542 539 550
dra - 549 554 547 546 543 557
dra - 543 551 541 540 537 553
dra - 547 551 544 547 545 554
step0: original Isaac's code
step0->step1: added _align_ at jump points
step1->step2: register permutation
step2->step3: replaced incl %ebp by decl %ecx
cmpl %ebp, %ecx jnz
jb ...
step3->step4: removed the three 'nop'
step4->step5: replaced '%ebp' by '%mm7'
Final state of the code:
_align_
next_line_32_to_16:
movd %mm7, %ecx
_align_
next_block_32_to_16:
movq (%esi), %mm0
movq %mm0, %mm1
movq %mm0, %mm2
PAND (5, 0) /* pand %mm5, %mm0 */
psrld $3, %mm0
PAND (3, 1) /* pand %mm3, %mm1 */
psrld $5, %mm1
por %mm1, %mm0
addl $8, %esi
PAND (4, 2) /* pand %mm4, %mm2 */
psrld $8, %mm2
por %mm2, %mm0
movq %mm0, %mm6
psrlq $16, %mm0
por %mm0, %mm6
movd %mm6, (%edi)
addl $4, %edi
decl %ecx
jnz next_block_32_to_16
addl %eax, %esi
addl %ebx, %edi
decl %edx
jnz next_line_32_to_16
That's basically your code, isn't it (modulo some line swaps) ?
I'm not sure the variations are very significative but at least in one run
there is an improvement on my machine and, as it exists on yours too, I'm ok
with the changes.
Here are my plans:
- revert the hideous big patch that was applied to the conversion code,
- first fix some formatting issues,
- fix the bug in the three non-MMX routines,
- gradually apply your modifications to the MMX routines (step0->step5),
- add your two new MMX routines,
- add the new code to support the non multiple of 4 widths; the Windows port
won't use it for the time being because of the alignment issue so it will be
#ifndef'ed,
- add the remaining stuff needed by the BeOS port (#ifndef'ed too for
Windows).
This will temporarily break the BeOS port but I'll do it as quickly as
possible (some hours I think). Note that I won't apply your optimizations to
the non-MMX code until I have evaluated their impact on the Pentium I.
If you want to make additional tweaks to the MMX code, you'll be able to do
it after that.
---
Eric Botcazou
ebotcazou@xxxxxxxxxx