[chrony-dev] long time to re-sync with bad system clock

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]



Hi,

I have a system with a rather bad internal clock, unstable with respect to temperature or voltage. Without load the frequency is about 80 ppm slow, with 100 % system load it is 10 ppm slow.

Anyhow, there is one GPS connected, and quite a few network time sources. When load is stable, then also the clock is stable and the poll interval of the network sources are automatically increased. They agree with the time provided by the GPS.

When I change the system load, the GPS notices this rather quickly and reduces NP and its span until its fit has adapted to the new frequency of the system. At this point I would like chrony to start follow the new time.

However, this easily happens before any (or just some) network sources have made any new measurement. I.e., they still think the frequency is like before. Leading to the GPS clock being marked 'x' as falseticker... And one of the (now wrong) network sources being used as reference instead.

Eventually, the network sources reach their poll interval and then notice that their linear fits are wildly off. On their first measurements they reduce their poll intervals and as soon as enough of them has noticed the new reality, the GPS source is allowed to discipline the clock again.

It seems to be possible to avoid this long time to re-sync by setting a maxpoll 6 or so on the network sources.

To avoid having to do that, I'd like to propose something like:

If a new system (B) is to be elected reference source, taking over from an old (A), then B must have made its last measurement at least later than the current (half)span of A. If it has not, a new 'check' measurement by B is provoked (without it gaining reference status yet).

Either of two things can then happen: B finds that it is in fact correct in which case it proceeds taking over the role as reference. Or (as I hope in this case) it finds that things have changed. Most likely it will (using the existing logics) reduce its poll interval to sort things out for itself. For some short time it will also be marked falseticker, perhaps leading to C being tried as reference, which then also gets provoked to check itself, in an avalanche...

Naturally, the bad clock system I would not choose as a production network time server... :-)

Cheers,
Håkan


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/