Maybe timings were right before in CE mode by luck when data caching was not enabled, and now that it's enabled they're not good anymore ?
If you compile your own Hatari, could you try to force a "return false" in function cancache030() around line 7607 cpu/newcpu.c . Do you get better values then wrt DSP speed ?
Ok I can give it a try. I was only able to build SDL1 versions in the past under Cygwin so hopefully that is still possible?
However I suspect d-cache changes will have no meaningful impact, based on what I can see so far.
- the code which waits on DSP in the first test case (the game) is a host-port status spinloop. The cost for these spin instructions was never accurate vs real HW, and the new timing hasn't changed much from what I can see. Not by 70% for sure.. It's within 10% of previous versions.
- The code which waits on the DSP in the second test case (DSPBENCH) is based on MFP events. i.e. the waiting time is dictated by something other than the CPU. If the CPU cycles costs have increased, it will just execute fewer CPU cycles during the test. I *think* this is why DSPBENCH reported correct results previously (IIRC to within a decimal point) even if the CPU timings were never perfect.
- The performance gain measured on the DSP side should vary a lot depending on the CPU side instructions which are running concurrently. I don't see that happening - it's pretty much fixed (maybe some variation, I'm not sure - but it seems to remain close to 70% when calculating back).
- The host port status/data registers (which execute in the spinloop, while timing the DSP) are not data-cacheable. They are volatile-mapped HW memory. If it was cacheable, the software would lose coherency with HW and quickly crash. I can't be sure that introducing the d-cache support is unrelated, but in real terms disabling the cache should have no effect on that test.
So taking these into account, I believe the change has something to do with DSP clocks relative to the MFP or the master timer - and not in relation to the CPU at all. There are too many clues beginning to point there I think. The MFP-based timing seems the most concrete of those.
D