Re: [hatari-devel] Structures in Hatari according to pahole

[ Thread Index | Date Index | More Archives ]


Why it's compiled into Hatari binary?

I started it 2 years ago. I wished I could finish it but I didn't have the time and I went on the crossbar and other stuff.
The code is not complete, but I did a lot of tests on the part I coded and it seemed to work (loops, maths, ...)

The main effort would be the serial or parallal bus to implement there.

It can of course be removed from the compiling, but it would be nice to keep it if someone wants to finish it (as I did a few years ago with the DSP code).


Le 16/06/2013 16:19, Eero Tamminen a écrit :

On sunnuntai 16 kesäkuu 2013, Nicolas Pomarède wrote:
I'm quite skeptical this would really yeld to a perceptible gain of
performance. Today's CPUs have L2 and L3 cache too, so even if there's a
cache miss at L1, it will certainly be in L2 (if we consider dsp and cpu
are a big part of the emulation, their data structure is certainly in L2
or L3). CPUs will also do instruction reordering, so maybe the data is
not in L1, but if instr are reordered, the cost of accessing L2 or L3
won't matter that much.

Also, admitting important members are defined in the structure as 32
consecutive bytes, how do me know the data will really be on a 32 byte
boundary ? We could have 12 bytes before and 20 bytes after, there's
nothing in standard C that guarantees you get the real data located to a
specific location once the program start (or you have to malloc memory
yourself if you want so).
Good point about it being statically allocated data.  For those C
guarantees struct alignment just to the largest structure member
(e.g. on ARM 8 for struct members if I remember right).

I think this is a job for the compiler ; there could be an option in the
compiler that allow to rearrange structure's member to the best
depending on the CPU architecture,
C standard disallows compiler from ordering structure members.

This is for legacy reasons (starting from RPC code from 80's), there's
huge amount of code that assumes members in structure are in given order,
so that one can e.g. memcpy/set just part of structure.

but handling this ourselves by moving structure's members on
the assumption of a possible cache design doesn't seem useful to me
The rule is really simple, keep members at their natural
alignment from the structure start.  One might also consider
changing type sizes if they were unnecessarily large, but
that might also affect performance.

(I'd rather have structure members grouped by their
logical meaning, to keep the code understandable and maintainable)
That's typically the best, as they're normally accessed together...

Anyway, before doing changes, one should first have some profiling
data which tells that there actually is a trouble spot.  My mail
was more FYI. :-)

I agree this could help on old cpu (like the 68020/68040) where
data/instr caches are so small that you need to fill them in the most
optimal way, but even the cpu of a cellphone now has more cache than the
ST had of total ram :)

Besides the potential DSP core changes, this could have largest effect,
if Hatari would actually use the HD6301 emulation (that seems to be
built-in, although I thought it was disabled as it's unused)...
hd6301 is disabled and unused (and also not complete in fact :) So it
couldn't be used anyway to run the IKBD's ROM at this point)
Why it's compiled into Hatari binary?

There are a lot of symbols from hd6301 code in Hatari:
$ readelf -s src/hatari |grep hd6301 |wc -l

$ readelf -s src/hatari.winuae |grep hd6301 | awk '
	BEGIN{size=0} {size+=$3} END{print size/1024, "kB"}'
44.0518 kB

	- Eero

Mail converted by MHonArc 2.6.19+