However, for i-cache, I think it's clear from the CPU core sources
that they're counted only for instructions that trigger either
prefetch or pipeline stall (=branch).
Do you agree on that interpretation? Because then:
* Those hit/miss counts also tell how often those events happens
* It should be fine to translate (on the profiler side) any
instruction that doesn't generate a miss, as being a hit.
Wouldn't it?
What I don't understand for i-cache, is how you can get multiple
hits or misses for single instruction. Instructions are all
word sized & word aligned, so they cannot cross cache line
boundary, so there should be only zero or one hit / miss,
shouldn't there?
And what about data cache? I can understand 2 misses if
data is e.g. long crossing cache line, but what about larger
numbers? Or is it about how much data the miss caused to
be fetched to the cache?