Re: [AD] Color convertors |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
[about the MMX code:]
I can't believe it (2) ;-) You rewrote the MMX code from scratch ! Why ? The
former code was a good performer: here are some tests I conducted with the
former code (Isaac's) vs the new code (basically yours, but I stripped the
whole one pixel/two pixels stuff to keep only the core loop):
K6-III 400 Mhz, MMX enabled, Matrox G200
Win95 OSR2, 800x600x16 desktop:
640x480x8 window:
SOLID results: former code new code
putpixel() - 13924 12727
hline() - 32795 31432
vline() - 24546 23588
line() - 509 443
rectfill() - 3290 2997
circle() - 336 304
circlefill() - 604 576
ellipse() - 347 301
ellipsefill() - 602 586
arc() - 583 522
triangle() - 620 592
Other functions:
textout() - 2346 2118
vram->vram blit() - 5647 5052
aligned vram->vram blit() - 5843 5267
blit() from memory - 688 610
aligned blit() from memory - 691 611
vram->vram masked_blit() - 5436 4878
masked_blit() from memory - 679 605
draw_sprite() - 682 609
draw_rle_sprite() - 688 611
draw_compiled_sprite() - 568 612
draw_trans_sprite() - 679 599
draw_trans_rle_sprite() - 683 606
draw_lit_sprite() - 677 605
draw_lit_rle_sprite() - 686 609
640x480x32 window:
SOLID results: former code new code
putpixel() - 11985 12158
hline() - 30793 30617
vline() - 23243 22545
line() - 430 409
rectfill() - 2681 2728
circle() - 277 274
circlefill() - 563 573
ellipse() - 279 279
ellipsefill() - 565 578
arc() - 482 489
triangle() - 586 586
Other functions:
textout() - 1944 1950
vram->vram blit() - 4700 4642
aligned vram->vram blit() - 4866 4850
blit() from memory - 571 561
aligned blit() from memory - 570 564
vram->vram masked_blit() - 4366 4293
masked_blit() from memory - 569 563
draw_sprite() - 559 550
draw_rle_sprite() - 559 556
draw_compiled_sprite() - 564 556
draw_trans_sprite() - 544 539
draw_trans_rle_sprite() - 552 546
draw_lit_sprite() - 540 537
draw_lit_rle_sprite() - 550 540
Win95 OSR2, 800x600x32 desktop:
640x480x8 window:
SOLID results: former code new code
putpixel() - 8340 8118
hline() - 26514 26084
vline() - 18594 18288
line() - 267 236
rectfill() - 1739 1575
circle() - 173 159
circlefill() - 503 498
ellipse() - 170 165
ellipsefill() - 521 503
arc() - 291 291
triangle() - 532 523
Other functions:
textout() - 1259 1206
vram->vram blit() - 3018 2869
aligned vram->vram blit() - 3132 2989
blit() from memory - 343 326
aligned blit() from memory - 345 327
vram->vram masked_blit() - 2947 2832
masked_blit() from memory - 341 325
draw_sprite() - 343 326
draw_rle_sprite() - 343 326
draw_compiled_sprite() - 343 326
draw_trans_sprite() - 341 322
draw_trans_rle_sprite() - 343 324
draw_lit_sprite() - 341 326
draw_lit_rle_sprite() - 342 324
Afaics there is no significant performance improvement for these routines on
my machine, there is rather a little loss. Given that we finally rejected a
patch for the blit() routine that showed at least some improvements on at
least some machines, I think we should have properly evaluated these
routines before scrapping a stable and relatively fast code.
IMO the only routines that demonstrate a clear performance improvement
are the two new MMX routines (24->32 and 32->24). Here are the results for
the 32->24 routine (after the needed fix):
Win95 OSR2, 800x600x24 desktop:
640x480x32 window:
SOLID results: former code new code
(non-MMX) (MMX)
putpixel() - 9297 9500
hline() - 27298 27595
vline() - 19364 19311
line() - 286 290
rectfill() - 1868 1994
circle() - 192 196
circlefill() - 518 547
ellipse() - 190 199
ellipsefill() - 537 550
arc() - 336 349
triangle() - 560 538
Other functions:
textout() - 1393 1460
vram->vram blit() - 3323 3479
aligned vram->vram blit() - 3485 3618
blit() from memory - 388 409
aligned blit() from memory - 387 409
vram->vram masked_blit() - 3166 3279
masked_blit() from memory - 388 407
draw_sprite() - 385 401
draw_rle_sprite() - 386 402
draw_compiled_sprite() - 385 406
draw_trans_sprite() - 377 396
draw_trans_rle_sprite() - 380 398
draw_lit_sprite() - 375 395
draw_lit_rle_sprite() - 380 397
The improvement seems to be relatively robust on my machine.
---
Eric Botcazou
ebotcazou@xxxxxxxxxx