[hatari-devel] Better cycle accurate mode for 68030 / Falcon |
[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]
HiI commited the changes I made to better take into account the memory access times in the Falcon.
This is not a straightforward task, because there's not a lot of documentation on how the bus is shared between cpu and videl ; we know it's roughly similar to the ST work (half of the time for cpu and half of the time for videl), but I never saw any clear documentation from Atari on this, neither any documentation that describes the more complex cases when videl use higher depth color modes and can block even further the cpu from accessing the bus.
We can try to solve this by running benchmarks in different video modes and see how many % we loose due to video and use this as a first approximation (but in fact the cpu would be slown down only when pixels are displayed, not during borders for example).
Anyway, for now the changes I made are assuming a video mode where videl doesn't take any extra cycles from the cpu, for example 640x200 in 4 colors (ST compatible med res)
I used nembench and nimbench (by Doug) to compare various cases.nembench does only a few tests, but it can be useful as a 1st approach to "calibrate" the memory access time. All nembench tests run with data+instr caches on, so this doesn't cover all the cases.
nimbench by Doug is much more precise (thanks again to Doug for all the program he wrote for cpu/fpu to improve emulation accuracy !) and cover more cases with caches on or off.
Here are the test results with latest dev sources : Nembench : ---------- Integer multiply (16bit) -> 0.640 Mips (~104%) Integer divide (16bit) -> 0.296 Mips (~81%) Linear (stalled) integer -> 8.007 Mips (~100%) Interleaved (piped) integer -> 8.007 Mips (~100%) 16bit read (100% hit) -> 7.902 MByte/sec (~100%) 16bit write (100% hit) -> 8.143 MByte/sec (~135%) 32bit read (100% hit) -> 15.797 MByte/sec (~100%) 32bit write (100% hit) -> 8.210 MByte/sec (~123%) Linear 32bit read (ST-Ram) -> 5.251 MByte/sec (~98%) Linear 32bit write (ST-Ram) -> 7.943 MByte/sec (~123%) Linear 32bit copy (ST-Ram) -> 3.172 MByte/sec (~98%)Only "slow" 16 bit ram was tested ; results are quite accurate except in some "write" cases ; for those cases, the problem is not the bus access time, but the opcode itself which is not accurate yet regarding some pipeline / parallel processing inside the 68030 (this is more visible in Nimbench measures below)
Nimbench : ----------See the attached text file for detailled results per tests, as well as the pdf where colors were added depending on the accuracy for better readability..
I compare 4 cases : - real Falcon : those are the results posted by Doug on his Falcon - Hatari 1.8 : Laurent added some tables in this version with the i-cached / not cached cycles values for each opcode. This improved the results, but it should be noted that the cycles are those from Motorola doc for a 32 bit bus with no wait state, which is not the case on the Falcon (16 bit bus, shared with videl). So the results are sometimes with a big difference compared to real Falcon - Hatari 2.0 : this used the latest WinUAE core, which gave good instr and data caches behaviour, but 68030 memory access time were not changed to reflect a 16 bit bus. The results were worse than Hatari 1.8, some opcodes are sometimes twice faster that what they should - Hatari dev : it uses a better model for the 16 bit bus shared with video (not complete yet as higher video modes are not taken into account)Overall, Hatari dev has a much better accuracy ; some individual opcodes are 20% wrong sometimes, but if you look at the colors in the pdf, the dev version has a lot more green than the previous Hatari versions.
The big differences are due to how the 68030 can sequence memory accesses, having the possibily to queue an access in parallel of internal computation ; so a read/write can be delayed internally and add some extra cycles later (or not, depending how it can overlap with the next instruction). In some cache cases too, we can also see some differences (for example NOP).
In the end, the Falcon has 161.32 MIPS and dev version reaches 160.06 MPIS, with an accuracy of 97.83 % (meaning Hatari is globally 2.17% slower than a real Falcon). I think that's a fairly good score ;-)
Of course, for specialised code using a lot of move.w, results might vary, as some "move" forms are still 13% off (or even 29% for "move dn,(an)", which is similar to what nembench showed)
For the moment, I think it's the best we can do. I discussed with Toni some of those cases were parallel actions are made inside the 68030 but the model to emulate this is not correct yet. Maybe this can be improved later.
Nicolas
Falcon Hatari 1.8 Hatari 2.0 Hatari dev 2017/02/08 cache mips cycles mips cycles diff mips cycles diff mips cycles diff DSP Host C:08 I 1,97 8,15 1,34 12,00 68,00 % 1,60 10,00 82,00 % 1,60 10,00 82,00 % DSP Host C:08 I+D 1,98 8,61 1,34 12,00 72,00 % 1,56 10,27 84,00 % 1,46 11,00 78,00 % DSP Host C:16 I 1,99 11,28 1,00 16,00 71,00 % 1,15 14,00 81,00 % 1,15 14,00 81,00 % DSP Host C:16 I+D 1,10 13,20 1,00 16,00 83,00 % 1,11 14,50 91,00 % 1,00 16,00 83,00 % DSP Host C:24 I 1,10 24,78 0,44 36,83 67,00 % 0,65 24,86 100,00 % 0,55 29,11 85,00 % DSP Host C:24 I+D 1,10 21,86 0,44 36,83 59,00 % 0,65 24,57 89,00 % 0,59 27,40 80,00 % DSP Host R:08 1,10 8,70 1,33 12,00 73,00 % 2,15 7,46 117,00 % 2,05 7,82 111,00 % DSP Host R:08 I 1,10 6,00 2,00 8,00 75,00 % 3,21 5,00 120,00 % 3,21 5,00 120,00 % DSP Host R:16 1,11 11,71 1,00 16,00 73,00 % 1,40 11,46 102,00 % 1,36 11,82 99,00 % DSP Host R:16 I 1,11 9,00 1,33 12,00 75,00 % 1,78 9,00 100,00 % 1,78 9,00 100,00 % DSP Host R:32 1,11 17,60 0,57 28,00 63,00 % 0,73 22,00 80,00 % 0,67 23,85 74,00 % DSP Host R:32 I 1,07 15,00 0,67 24,00 63,00 % 0,80 20,00 75,00 % 0,80 20,00 75,00 % DSP Host W:08 2,11 7,61 2,00 8,00 95,00 % 3,26 4,91 155,00 % 2,70 5,95 128,00 % DSP Host W:08 I 4,01 4,00 2,00 8,00 50,00 % 5,36 3,00 133,00 % 5,36 3,00 133,00 % DSP Host W:16 1,50 10,73 1,33 12,00 89,00 % 1,80 8,92 120,00 % 1,61 10,00 107,00 % DSP Host W:16 I 2,29 7,00 1,34 12,00 58,00 % 2,29 7,00 100,00 % 2,29 7,00 100,00 % DSP Host W:32 0,96 16,66 0,67 24,00 69,00 % 0,80 19,95 84,00 % 0,73 21,91 76,00 % DSP Host W:32 I 1,23 13,00 0,67 24,00 54,00 % 0,89 18,00 72,00 % 0,89 18,00 72,00 % DSP HostStat: andb (An),Dn I 1,34 11,93 1,00 16,00 75,00 % 1,15 13,92 86,00 % 1,15 13,92 86,00 % DSP HostStat: andb Dn,(An) I 1,60 10,00 1,00 16,00 63,00 % 1,34 12,00 83,00 % 1,34 12,00 83,00 % DSP HostStat: andib #X,(An) I 1,60 10,00 1,00 16,00 63,00 % 1,23 13,08 76,00 % 1,23 13,08 76,00 % DSP HostStat: btst #X,(An) I 1,34 11,93 1,00 16,00 75,00 % 1,15 13,92 86,00 % 1,15 13,92 86,00 % DSP HostStat: btst Dn,(An) I 1,34 11,93 1,00 16,00 75,00 % 1,47 10,94 109,00 % 1,47 10,94 109,00 % DSP HostStat: cmpb (An),Dn I 1,60 10,00 1,34 12,00 83,00 % 1,47 10,94 91,00 % 1,47 10,94 91,00 % addl Dn,Dn 3,12 5,13 4,01 4,00 128,00 % 8,03 2,00 257,00 % 4,00 4,00 128,00 % addl Dn,Dn I 8,03 2,00 4,00 4,00 50,00 % 8,03 2,00 100,00 % 8,03 2,00 100,00 % addl Dn,Dn (i-miss) I 3,11 5,15 4,00 4,00 129,00 % 8,03 2,00 258,00 % 2,67 6,00 86,00 % moveb (An)+,Dn 1,65 9,72 1,34 12,00 81,00 % 2,00 8,00 122,00 % 1,34 12,00 81,00 % moveb (An)+,Dn D 2,01 8,00 1,34 12,00 67,00 % 2,21 7,27 110,00 % 1,61 10,00 80,00 % moveb (An)+,Dn I 2,26 7,11 2,01 8,00 89,00 % 2,69 6,00 119,00 % 2,01 8,00 89,00 % moveb (An)+,Dn I+D 3,02 5,32 2,01 8,00 67,00 % 2,81 5,70 93,00 % 2,69 6,00 89,00 % movel (An)+,(Ay) x10 0,76 21,17 0,67 24,05 88,00 % 1,14 14,00 151,00 % 0,67 24,00 88,00 % movel (An)+,(Ay) x10 D 0,72 22,18 0,67 24,05 92,00 % 1,14 14,00 158,00 % 0,67 24,00 92,00 % movel (An)+,(Ay) x10 I 0,92 17,50 0,80 20,00 88,00 % 1,33 12,00 146,00 % 0,80 20,00 88,00 % movel (An)+,(Ay) x10 I+D 0,86 18,57 0,80 20,00 93,00 % 1,33 12,00 155,00 % 0,80 20,00 93,00 % movel (An)+,Dn 1,16 13,85 1,00 16,00 87,00 % 1,61 10,00 139,00 % 1,00 16,00 87,00 % movel (An)+,Dn D 1,15 13,92 1,00 16,00 87,00 % 1,61 10,00 139,00 % 1,00 16,00 87,00 % movel (An)+,Dn I 1,42 11,28 1,34 12,00 94,00 % 2,01 8,00 141,00 % 1,34 12,00 94,00 % movel (An)+,Dn I+D 1,42 11,28 1,34 12,00 94,00 % 2,01 8,00 141,00 % 1,34 12,00 94,00 % moveml (An)+:(Ay) x10 0,85 18,82 0,83 19,22 98,00 % 1,78 9,00 209,00 % 0,91 17,59 107,00 % moveml (An)+:(Ay) x10 D 0,81 19,80 0,83 19,22 103,00 % 1,78 9,00 220,00 % 0,91 17,59 113,00 % moveml (An)+:(Ay) x10 I 0,89 17,93 0,87 18,42 97,00 % 1,83 8,78 204,00 % 1,00 16,00 112,00 % moveml (An)+:(Ay) x10 I+D 0,85 18,88 0,87 18,42 102,00 % 1,83 8,78 215,00 % 1,00 16,00 118,00 % movew #imm,(An) 1,30 12,32 1,00 16,00 77,00 % 2,70 5,94 207,00 % 1,38 11,64 106,00 % movew #imm,(An) I 3,11 5,15 2,00 8,00 64,00 % 2,68 6,00 86,00 % 2,00 8,00 64,00 % movew (An)+,Dn 1,65 9,72 1,34 12,00 81,00 % 2,00 8,00 122,00 % 1,34 12,00 81,00 % movew (An)+,Dn D 1,64 9,79 1,34 12,00 82,00 % 2,02 8,00 122,00 % 1,34 12,00 82,00 % movew (An)+,Dn I 2,27 7,08 2,01 8,00 89,00 % 2,69 6,00 118,00 % 2,01 8,00 89,00 % movew (An)+,Dn I+D 2,40 6,69 2,01 8,00 84,00 % 2,47 6,50 103,00 % 2,01 8,00 84,00 % movew (An),(An) 1,31 12,27 1,00 16,00 77,00 % 1,80 8,93 137,00 % 1,36 11,76 104,00 % movew (An),(An) D 1,67 9,59 1,00 16,00 60,00 % 2,02 7,94 121,00 % 1,64 9,76 98,00 % movew (An),(An) I 1,94 8,27 1,34 12,00 69,00 % 2,01 8,00 103,00 % 2,00 8,00 103,00 % movew (An),(An) I+D 2,63 6,09 1,34 12,00 51,00 % 2,01 8,00 76,00 % 2,01 8,00 76,00 % movew (An),Dn 1,65 9,70 1,33 12,00 81,00 % 2,70 5,95 163,00 % 1,61 10,00 97,00 % movew (An),Dn D 2,63 6,10 1,33 12,00 51,00 % 3,23 5,00 122,00 % 2,76 5,81 105,00 % movew (An),Dn I 2,25 7,13 2,00 8,00 89,00 % 4,00 4,00 178,00 % 2,68 6,00 119,00 % movew (An),Dn I+D 4,00 4,00 2,00 8,00 50,00 % 4,02 4,00 100,00 % 4,00 4,00 100,00 % movew (Dn.l),(An) 1,00 16,00 0,80 20,00 80,00 % 1,34 11,94 134,00 % 1,03 15,64 102,00 % movew (Dn.l),(An) D 0,93 17,25 0,80 20,00 86,00 % 1,61 9,94 174,00 % 1,38 11,63 148,00 % movew (Dn.l),(An) I 1,57 10,24 1,33 12,00 85,00 % 1,60 10,00 102,00 % 1,60 10,00 102,00 % movew (Dn.l),(An) I+D 1,59 10,11 1,33 12,00 84,00 % 1,60 10,00 101,00 % 1,60 10,00 101,00 % movew (Dn.l),Dn 1,38 11,59 1,00 16,00 72,00 % 1,65 9,74 119,00 % 1,03 15,52 75,00 % movew (Dn.l),Dn D 1,49 10,74 1,00 16,00 67,00 % 2,07 7,75 139,00 % 1,38 11,63 92,00 % movew (Dn.l),Dn I 1,57 10,22 2,00 8,00 128,00 % 2,00 8,00 128,00 % 1,60 10,00 102,00 % movew (Dn.l),Dn I+D 2,00 8,00 2,00 8,00 100,00 % 2,00 8,00 100,00 % 2,00 8,00 100,00 % movew 16(An),Dn 1,12 14,38 1,34 12,00 120,00 % 2,79 5,75 250,00 % 1,39 11,51 125,00 % movew 16(An),Dn D 1,41 11,39 1,34 12,00 95,00 % 4,28 3,75 304,00 % 2,10 7,63 149,00 % movew 16(An),Dn I 1,97 8,13 2,00 8,00 102,00 % 4,00 4,00 203,00 % 2,67 6,00 136,00 % movew 16(An),Dn I+D 3,20 5,00 2,00 8,00 63,00 % 4,00 4,00 125,00 % 4,00 4,00 125,00 % movew 8(An,Dn),Dn 1,14 14,12 1,00 16,00 88,00 % 1,65 9,74 145,00 % 1,03 15,52 91,00 % movew 8(An,Dn),Dn D 1,44 11,14 1,00 16,00 70,00 % 2,07 7,75 144,00 % 1,38 11,63 96,00 % movew 8(An,Dn),Dn I 1,57 10,24 2,00 8,00 128,00 % 2,00 8,00 128,00 % 1,60 10,00 102,00 % movew 8(An,Dn),Dn I+D 2,29 7,00 2,00 8,00 88,00 % 2,00 8,00 88,00 % 2,00 8,00 88,00 % movew Dn,(An) 1,84 8,73 2,00 8,00 109,00 % 3,25 4,93 177,00 % 2,03 7,89 111,00 % movew Dn,(An) D 1,84 8,72 2,00 8,00 109,00 % 3,25 4,93 177,00 % 2,04 7,88 111,00 % movew Dn,(An) I 3,12 5,14 2,00 8,00 64,00 % 5,36 3,00 171,00 % 4,00 4,00 129,00 % movew Dn,(An) I+D 3,12 5,14 2,00 8,00 64,00 % 5,36 3,00 171,00 % 4,00 4,00 129,00 % movew Dn,16(An) 1,11 14,43 1,34 12,00 120,00 % 2,04 7,88 183,00 % 1,38 11,64 124,00 % movew Dn,16(An) D 1,12 14,31 1,34 12,00 119,00 % 2,04 7,88 182,00 % 1,38 11,63 123,00 % movew Dn,16(An) I 3,92 4,09 2,00 8,00 51,00 % 2,01 8,00 51,00 % 1,60 10,00 41,00 % movew Dn,16(An) I+D 3,11 5,15 2,00 8,00 64,00 % 2,00 8,00 64,00 % 1,60 10,00 52,00 % nop 3,11 5,15 4,01 4,00 129,00 % 8,03 2,00 258,00 % 3,99 4,00 129,00 % nop I 8,03 2,00 4,00 4,00 50,00 % 8,03 2,00 100,00 % 8,03 2,00 100,00 % nop (i-miss) I+D 3,11 5,15 4,00 4,00 129,00 % 8,03 2,00 258,00 % 2,68 6,00 86,00 % DSP 36,758 26,81 70,46 % 39,00 96,50 % 37,61 92,29 % CPU 124,56 100,40 86,72 % 175,61 150,18 % 122,45 100,05 % DSP+CPU 161,318 127,21 82,07 % 214,61 134,85 % 160,06 97,83 %
Attachment:
cpu_ce_compare.pdf
Description: Adobe PDF document
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |