Re: [AD] Allegro x86 clear and blit optimizations - update

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]



Esaelon@xxxxxxxxxx wrote:
> 
> > When I wrote the 32bpp MMX code, it turned out to be 10% slower than the
> >  regular non-MMX one. Perhaps MMX instructions have trouble with getting
> >  data from system RAM instead of L2 cache? Just a guess though.
> 
> I don't know why the MMX instructions in the clears would have to read the
> cache line (except of course PPro+ write miss cache line load)

I'm no x86 expert, but isn't the cache write also? So writting to memory
would actually write to cache, which would then later be copied to RAM.
Tje problem that 32bpp images are big (memory wise), thus fill up the
cache faster. Hense the memory controler needs to write back the data to
system RAM more often (which is slow).

Also, inserting a read operaion in the inner MMX loop halves the clear
speed on my Celeron and P3.


> EXCEPT the
> initial per-line setup for the writes, which is simpler than for the 8- and
> 16-bit code.

How so?

> Admittedly, I haven't tried it yet, however the 8-bit and 16-bit
> MMX clears are (at least on my PMMX system) roughly twice as fast as the
> non-MMX clears, and for as much of a gain I would think the MMX optimizations
> would be worthwhile.

In general, yes.

--
- Robert J Ohannessian

"Microsoft code is probably O(n^20)" (My CS teacher)



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/