So I finally got a chance to do some d-cache testing with the latest Hatari (pulled last night's build from
antarctica.no). Here's a snip from a profile run which mixes a two memory sources and a destination has fairly predictable behaviour.
(BTW, it appears to run my code correctly - at least so far - which is also good!)
$000360e8 : move.b (a0)+,d1 0.13% (250336, 2004636, 467, 0)
$000360ea : move.w (a6,d1.l*2),(a1)+ 0.13% (250336, 3006244, 533, 109976)
$000360ee : move.w d0,d1 0.13% (250336, 292, 14, 0)
$000360f0 : add.w d2,d0 0.13% (250336, 1001440, 442, 0)
$000360f2 : move.b (a0)+,d1 0.13% (250336, 2003044, 61, 233119)
$000360f4 : move.w (a6,d1.l*2),(a1)+ 0.13% (250336, 3005460, 502, 150930)
$000360f8 : move.w d0,d1 0.13% (250336, 1848, 404, 0)
$000360fa : add.w d2,d0 0.13% (250336, 999960, 24, 0)
$000360fc : move.b (a0)+,d1 0.13% (250336, 2003296, 461, 235027)
$000360fe : move.w (a6,d1.l*2),(a1)+ 0.13% (250336, 3005804, 496, 160924)
$00036102 : move.w d0,d1 0.13% (250336, 340, 16, 0)
$00036104 : add.w d2,d0 0.13% (250336, 1001476, 403, 0)
$00036106 : move.b (a0)+,d1 0.13% (250336, 2003120, 75, 236046)
...
$00036174 : move.w (a6,d1.l*2),(a1)+ 0.13% (250336, 3005724, 751, 173853)
$00036178 : add.w d2,d0 0.13% (250336, 10180, 2486, 0)
$0003617a : move.b (a0)+,d0 0.13% (250336, 2003072, 65, 234673)
$0003617c : move.w (a6,d0.l*2),(a1)+ 0.13% (250336, 3008296, 2581, 174768)
$00036180 : adda.l d3,a3 0.13% (250336, 10248, 2500, 0)
$00036182 : adda.l d4,a4 0.13% (250336, 991520, 22, 0)
$00036184 : adda.l a5,a0 0.13% (250336, 1001564, 2529, 0)
$00036186 : adda.l d5,a1 0.13% (250336, 991496, 18, 0)
Digesting out the in-between code shows a sensible sequence of longword fetches serving byte-wise reads by the CPU. The first read always incurs a miss, and subsequent 3 reads incur a variable (but very high) ratio of hits (subject to in-between code and occasional interrupts etc..). This is pretty much exactly what I'd expect to see for the 68030 datacache operating in non-burst mode.
$000360e8 : move.b (a0)+,d1 0.13% (250336, 2004636, 467, 0)
$000360f2 : move.b (a0)+,d1 0.13% (250336, 2003044, 61, 233119)
$000360fc : move.b (a0)+,d1 0.13% (250336, 2003296, 461, 235027)
$00036106 : move.b (a0)+,d1 0.13% (250336, 2003120, 75, 236046)
$00036110 : move.b (a0)+,d1 0.13% (250336, 2004600, 452, 0)
$0003611a : move.b (a0)+,d1 0.13% (250336, 2003116, 76, 233019)
$00036124 : move.b (a0)+,d1 0.13% (250336, 2003264, 467, 237310)
$0003612e : move.b (a0)+,d1 0.13% (250336, 2003088, 70, 235527)
$00036138 : move.b (a0)+,d1 0.13% (250336, 2009552, 1685, 0)
$00036142 : move.b (a0)+,d1 0.13% (250336, 2003144, 85, 233023)
$0003614c : move.b (a0)+,d1 0.13% (250336, 2003604, 1650, 235691)
$00036156 : move.b (a0)+,d1 0.13% (250336, 2002944, 44, 235930)
$00036160 : move.b (a0)+,d1 0.13% (250336, 2010636, 1959, 0)
$0003616a : move.b (a0)+,d1 0.13% (250336, 2003200, 91, 235943)
$00036172 : move.b (a0)+,d1 0.13% (250336, 2003048, 65, 235859)
$0003617a : move.b (a0)+,d0 0.13% (250336, 2003072, 65, 234673)
I did get some warnings like this during recording, which I haven't investigated yet....
ERROR: trying to add costs to non-existing 0x1012480 caller of 0x10114e8!
ERROR: trying to add costs to non-existing 0x101242c caller of 0x10114e8!
ERROR: trying to add costs to non-existing 0x1012480 caller of 0x10114e8!
ERROR: trying to add costs to non-existing 0x101242c caller of 0x10114e8!
ERROR: trying to add costs to non-existing 0x1012480 caller of 0x10114e8!
I also noticed that trying to profile code with TT ram allocated doesn't work very well as it slows my test down to less than 1fps, and can halt for several seconds at a time. While it's probably reasonable that it could get slower as the memory footprint increases, it seems to be happening in a very nonlinear way and the actual amount of code being executed remains the same. The code is executing from TT ram at the time, so perhaps that has something to do with it.
I'll see what else I can find....
D.