Re: [hatari-devel] Structures in Hatari according to pahole |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
On 16/06/2013 11:40, Eero Tamminen wrote:
Hi,
I was looking at Hatari structures with pahole, because how the items
in frequently used structures are ordered, affects their memory usage
and performance.
Items in structures should be ordered so that they're on their natural
boundaries (this is easiest done by putting largest members first),
AND so that items used together are close together in the structure, on
the same cacheline if they fit one. Cacheline size differs between
processors though, but I think typically they're still 32 or 64.
[...]
-> It would be better to put those 4 Uint8 members together,
that would save noticeably memory and more of these items
would fit into same cacheline.
Hi
I'm quite skeptical this would really yeld to a perceptible gain of
performance. Today's CPUs have L2 and L3 cache too, so even if there's a
cache miss at L1, it will certainly be in L2 (if we consider dsp and cpu
are a big part of the emulation, their data structure is certainly in L2
or L3). CPUs will also do instruction reordering, so maybe the data is
not in L1, but if instr are reordered, the cost of accessing L2 or L3
won't matter that much.
Also, admitting important members are defined in the structure as 32
consecutive bytes, how do me know the data will really be on a 32 byte
boundary ? We could have 12 bytes before and 20 bytes after, there's
nothing in standard C that guarantees you get the real data located to a
specific location once the program start (or you have to malloc memory
yourself if you want so).
I think this is a job for the compiler ; there could be an option in the
compiler that allow to rearrange structure's member to the best
depending on the CPU architecture, but handling this ourselves by moving
structure's members on the assumption of a possible cache design doesn't
seem useful to me (I'd rather have structure members grouped by their
logical meaning, to keep the code understandable and maintainable)
I agree this could help on old cpu (like the 68020/68040) where
data/instr caches are so small that you need to fill them in the most
optimal way, but even the cpu of a cellphone now has more cache than the
ST had of total ram :)
Besides the potential DSP core changes, this could have largest effect,
if Hatari would actually use the HD6301 emulation (that seems to be
built-in, although I thought it was disabled as it's unused)...
hd6301 is disabled and unused (and also not complete in fact :) So it
couldn't be used anyway to run the IKBD's ROM at this point)
Nicolas