[hatari-devel] Hatari profiling question (was: Accelerating blitting on TT by code re-arrangement (on Emutos-devel)) |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
- To: Hatari Development <hatari-devel@xxxxxxxxxxxxxxxxxxx>
- Subject: [hatari-devel] Hatari profiling question (was: Accelerating blitting on TT by code re-arrangement (on Emutos-devel))
- From: Christian Zietz <czietz@xxxxxxx>
- Date: Wed, 10 Feb 2021 09:52:53 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1612947174; bh=8DJHsfsWl/ILYNfSH0qFFaU48JF5v2mNHnWlpsDJtlU=; h=X-UI-Sender-Class:To:References:From:Subject:Date:In-Reply-To; b=cQI37WclZTxyJwoygqh6rXHvyFfl/y/ETYxPXDJ4s5wx23Wuma0iGtr7Afcba/2Oe 47syQDicaUG5niDwzwA7EA4wLi9PPB4gLj8xQZ/EIoEKkmQxQmCJ/la7hVnyDQ8bEE 81ajEevEZlwvc429cuXMN5dul25dejrQAFXi9Bco=
Eero Tamminen schrieb:
On 7.2.2021 10.54, Christian Zietz wrote:
That's what I used for my initial analysis in this case, too. Although,
tbh, I'm not fully sure how to interpret the results. I suppose some
instructions are counted with zero cycles because they're fully absorbed
by surrounding instructions? But how can an instruction that is executed
1 million times be responsible for 2 million instruction cache misses?
Can you provide example profiler disassembly?
(Maybe on the Hatari mailing list?)
From my EmuTOS VDI profiling:
$00e21794 : adda.w d2,a0 2.79% (1178064, 2361790, 1419, 0)
$00e21796 : adda.w d3,a1 2.79% (1178064, 2353471, 38, 0)
$00e21798 : move.l d1,d0 2.79% (1178064, 7068967, 1178050, 0)
$00e2179a : move.w (a0),d0 2.79% (1178064, 10689954, 190, 577010)
$00e2179c : swap d0 2.79% (1178064, 7069362, 1178150, 0)
$00e2179e : move.l d0,d1 2.79% (1178064, 76, 0, 0)
$00e217a0 : rol.l d4,d0 2.79% (1178064, 76, 0, 0)
$00e217a2 : jmp (a2) 2.79% (1178064, 14139013, 2356248, 0)
Like I said, I'm not sure how to interpret the I-cache misses,
particularly in the last line. Is it because it's a JMP and both the
cache miss while fetching the instruction as well as the cache miss
while fetching the jump target count towards the number? Or is it
because the cache misses for the instruction *preceding* the JMP (ROL.L,
shown with 0 I-cache misses) are counted towards the JMP instruction?
Regards
Christian
--
Christian Zietz - CHZ-Soft - czietz@xxxxxxx
WWW: https://www.chzsoft.de/
PGP/GnuPG-Key-ID: 0x52CB97F66DA025CA / 0x6DA025CA