Re: [chrony-users] High skew values

[ Thread Index | Date Index | More Archives ]

This is now solved (see below).

On 2013-07-26 18:59, Bill Unruh wrote:
IF you can figure out what the "average" drift is, you could use adjtimex to
adjust the system clock's rate to take that out.

No, I can't.  As you correctly point out below, this is impossible for such a high drift.
>adjtimex --tick=13000
adjtimex: Invalid argument
for this kernel:
   USER_HZ = 100 (nominally 100 ticks per second)
   9000 <= tick <= 11000
   -32768000 <= frequency <= 32768000
and indeed the system log does occasionally include:
chronyd[463]: Required tick 13194 outside allowed range (9000 .. 11000)

What I don't understand is this: chrony logs the following in /var/log/messages:
 chronyd[490]: System clock wrong by 15.124741 seconds, adjustment started
It does this (saying the clock is wrong by about 5-20s) even when the clock is wrong by hours.

I think that this is the "least squares" offset.

Alright, that's very different from what I thought it was.

It sends out a packet with a local time stamp. The remote server, timestamps
the packet when it is received and when it is sent out again, and your machine
timestamps it when it comes back. The measured offset is the difference
betweeen the means of the local timestamps and the remote timestamps. chrony
then takes the last N offsets (compensated for changes it has made in the
drift rate of the clock) and does a least squares fit to find out what the
best estimate is for the drift error and offset error. It also tests to see if
the deviations from the least squares fit look roughly random. If not, it
makes N smaller and tries again until N is 3. In your system N seems to hang
around 3 a lot.

Thank you Bill for a *very* clear explanation.  I think I finally understand what you meant earlier - this system has 2 problems: Very uneven drift and very high drift.  The uneven drift causes the "Can't synchronise: no majority" errors, and the high drift causes the "Required tick outside allowed range" errors.  So chrony cannot set an accurate adjustment nor a quick enough adjustment to compensate.

That is beyond the ability of chrony (or anything) to correct. The max drift
rate that can be compensated is 6 sec/minute. Jumping the clock is your

If I was going to live with this system as-is, then you would be right.

For anyone else reading this: An easy way to diagnose a sick machine is to use something like:
>adjtimex -c=10 -i=10
                                      --- current ---   -- suggested --
cmos time     system-cmos  error_ppm   tick      freq    tick      freq
1374906355      -0.660995
1374906367      -3.003384  -234238.9  11000         0
1374906379      -5.041906  -203852.2  11000         0   13038   3421012
1374906390      -6.129622  -108771.6  11000         0   12087   4691487
1374906402      -8.383569  -225394.7  11000         0   13253   6206387
1374906415     -11.768661  -338509.2  11000         0   14385    603062
1374906428     -15.120310  -335164.9  11000         0   14351   4253587
1374906440     -17.387367  -226705.7  11000         0   13267    373175
1374906453     -20.810277  -342291.0  11000         0   14422   5963612
1374906465     -23.090249  -227997.2  11000         0   13279   6370600
if those suggested "tick" values on the right are >11000 (ie, drift >6s per minute), then the timer is too broken for chrony to fix.
So when Bill tells you that your machine is very sick, listen to him. :-)

This is a "hardware" issue (in the case of a virtual machine, something more elaborate) that needs to be fixed - in my case, by the hosting service provider.

Also, for the record, although virtual machines do suffer from much more drift problems than physical machines, there is a difference between an "inaccurate" clock (typical of virtual machines), and a "broken" clock (not so typical).  Most virtual machines are not "broken" and chrony works just fine.

Arnon Weinberg

Mail converted by MHonArc 2.6.19+