For example this one might see a penalty sometimes - since the program address is >$100 and the X: address it is fetching from *might* also be >$200, which would mean competition for the external bus inside a single opcode.
(note: internal P memory is half the size of internal X or Y, hence the $100/$200 boundaries mentioned above - IIRC (?) this is because P: addresses are twice as 'wide' - 2 words per address or 48bits... 2 fetches per opcode, which is also probably why no operation takes less than 2 osc cycles)
D.