IMO, it can. You know, in advance, how many cycles it will take to emulate instruction, so you know when it 'happens'. knowing microcode (or internal CPU layout) would be perfect, but not really necessary, you just need to 'measure' when instruction tries to access the bus. It wouldn't be much slower, bacause there would b a lot of empty cycles, and computation of some instructions would 'smear' on many cycles. This could be also a good base for otherwise tricky emulation of border remowal, palette tricks, sync scrolling and so on.
AdamK