| Re: [hatari-devel] optimizing for speed (was: Beats of Rage, new Falcon game) | 
[ Thread Index | 
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
] 
Hi,
On sunnuntai 16 joulukuu 2012, George Nakos wrote:
> Sunday, December 16, 2012, 12:04:40 PM, you wrote:
> > In many compression algorithms you only need a small buffer for
> > decompression.
> 
> Don't oversimplify things. Compression is (IMHO) a design decision not
> to  be taken lightly. For example, what speed is the depacking routine
> running at? Does it depack in place or does it need an extra buffer to
> store  the  unpacked  data and then copy it in place? What impact does
> that  have on speed? Is that an acceptable wait for the player? And so
> on and so forth.
Naturally one needs to take these into account.  But you cannot just
assume things, because assumptions can often be wrong.
(Last ~6 years at work I've been doing system performance/resource
usage analysis, so I do have some clue about this stuff.)
> When  I decided to port Downfall from the Jaguar to the Falcon I had a
> few  goals in mind, including close to zero wait time for the player -
> since it's a game you want to restart right after you lose because you
> want  to  have  "just  another  go".  Init times mattered as well, and
> because of that, all the backgrounds are loaded unpacked directly from
> disk to the screen buffers. If I opted to have compression to the 1.83
> or  8.78mb  background files (for 4mb and 14mb machines respectively),
> it  would  have  added  a  delay I deemed unacceptable. That's why the
> unpacked zip with the 14mb backgrounds is close to 180mb.
As I stated, package data can also used to speed up things, it doesn't
necessarily slow down things.
If you could use DMA to directly read data to screen, uncompression slows
down things.  If your mass media is much slower than your RAM & CPU, using
pre-compressed data can speed up things.  So, what are your mass media
and RAM/CPU data throughput speeds?
 
> But the thing is, you know how many complaints I had about that?
> 
> 0 (zero).
Out of curiousity, how many user do you have? :-)
> So,  in  your  opinion,  would  adding an extra layer of decompression
> and lots more development time (sorting out the unpacking routine etc,
> let  alone  deciding one first) inside the game just because it's cool
> have any specific impact?
It should be obvious that one should use compression only if it helps
things that matter.  Why one would add that if it makes things worse?
It helps if you know exact physical constraints for your chosen
HW platform, then you can calculate theoretical bounds for different
ways of doing things instead of needing to test everything.
(One of course needs to test things, but one can at least directly
discard algorithms / data pipelines that wouldn't help even in theory
when one looks at how many times given data is written or read from RAM.)
> When   we're   doing   stuff  for the Jaguar, memory and storage space
> really  precious (2mb of RAM for single load games, or an extra 4mb if
> making  a  cart version).
How fast the loads from the mass media are?  Are they faster if you can
do linear loads?  How fast memory copies are?  Has Jag some SIMD
operations that can do copies with some transformations faster?
(If loading from mass media is really slow, but there isn't enough RAM
for keeping all assets in RAM, one possibility would be to keep them as
compressed in RAM.)
> And even so, we're using the packer with the fastest  observed unpacking
> time,  Ray's lz77.
How fast that is compared to LZO (which is built with latest GCC from
Vincent at high optimization level)?  LZO is supposed to be one of the
fastest decompression algorithms out there, and it offers pretty good
compression rates too.  On ARM and Intel hand optimized versions get
fairly close to memcpy speed on decompression.
	- Eero
PS. After some googling there's also LZ4:
	http://code.google.com/p/lz4/
which is supposed to be faster than LZO, but apparently its speed depends
a lot on utilizing  Intel & AMD CPU caches well, whereas LZO is a more
generic.
There's also Snappy:
	http://code.google.com/p/snappy/
But it's optimized for 64-bit little byte endian machines.