RE: [AD] Use MMX to get fast

Blit to/from 16-bpp bitmaps is already implemented with MMX. I didn’t implement it with 32-bpp bitmaps because it was a net speed loss for me, for some inexplicable reason.

I find it odd that either function differ in speed: memory is the bottleneck, not CPU. That and the Pentium II has write-combining hardware, which makes either routine just as fast: no need for either alignment checks or R-M-W operations on memory.

That said, the Allegro blitters do need to do some additional bookkeeping on blits because of subbitmaps, video bitmaps, non-linear bitmaps, etc. I would like to special case plain mem->mem copies and avoid all that overhead.

-----Original Message-----
From: alleg-developers-admin@xxxxxxxxxx [mailto:alleg-developers-admin@xxxxxxxxxx] On Behalf Of RogerioUP
Sent: Monday, September 27, 2004 4:18 PM
To: alleg-developers@xxxxxxxxxx
Subject: [AD] Use MMX to get fast

I use a simple MMX function to move images from memory to video and the time to transfer is much faster than allegro (half of time). Now I'm using Allegro because I'm having a lot of problems with direct accesses to new hardwares; Before, I used my own Operating System.

I don't know how the allegro make blit's, if it uses "rep movsd" or card accelerated commands. I have a card with VESA 3.0 and my routine is much faster than allegro. I create a routine called "repmovsq" it transfer 8 bytes (Quad Word) per cycle, just using:

mov ecx,(num of bytes) / 8

loopq:

movq MM0,[esi] ; esi = source index

movq [edi],MM0 ; edi = destination index

add esi,8

add edi,8

dec ecx

jnz loopq

;

; here the complement if (num of bytes) is not a multiple of 8

;

; opcodes

; movq MM0,[esi] = 0x0F, 0x6F, 0x06

; movq [edi],MM0 = 0x0F, 0x7F, 0x07

;

My English is not better because I'am brazilian

best Regards, Rogerio Uchoas Penchel

rogerup@xxxxxxxxxx