Re: [chrony-users] High skew values

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Sat, 27 Jul 2013, Arnon Weinberg wrote:


This is now solved (see below).

Acrually you did not tell us how you solved it. Or was it just yelling at the
provider to they put you onto a better virtual machine.


On 2013-07-26 18:59, Bill Unruh wrote:
 IF you can figure out what the "average" drift is, you could use adjtimex
 to
 adjust the system clock's rate to take that out.

No, I can't. As you correctly point out below, this is impossible for such a high drift.
adjtimex --tick=13000
adjtimex: Invalid argument
for this kernel:
   USER_HZ = 100 (nominally 100 ticks per second)
   9000 <= tick <= 11000
   -32768000 <= frequency <= 32768000
and indeed the system log does occasionally include:
chronyd[463]: Required tick 13194 outside allowed range (9000 .. 11000)

> What I don't understand is this: chrony logs the following in > /var/log/messages: > > chronyd[490]: System clock wrong by 15.124741 seconds, adjustment > > started > It does this (saying the clock is wrong by about 5-20s) even when the > clock is wrong by hours.

 I think that this is the "least squares" offset.

Alright, that's very different from what I thought it was.

 It sends out a packet with a local time stamp. The remote server,
 timestamps
 the packet when it is received and when it is sent out again, and your
 machine
 timestamps it when it comes back. The measured offset is the difference
 betweeen the means of the local timestamps and the remote timestamps.
 chrony
 then takes the last N offsets (compensated for changes it has made in the
 drift rate of the clock) and does a least squares fit to find out what the
 best estimate is for the drift error and offset error. It also tests to
 see if
 the deviations from the least squares fit look roughly random. If not, it
 makes N smaller and tries again until N is 3. In your system N seems to
 hang
 around 3 a lot.

Thank you Bill for a *very* clear explanation. I think I finally understand what you meant earlier - this system has 2 problems: Very uneven drift and very high drift. The uneven drift causes the "Can't synchronise: no majority" errors, and the high drift causes the "Required tick outside allowed range" errors. So chrony cannot set an accurate adjustment nor a quick enough adjustment to compensate.

 That is beyond the ability of chrony (or anything) to correct. The max
 drift
 rate that can be compensated is 6 sec/minute. Jumping the clock is your
 onlyoption.

If I was going to live with this system as-is, then you would be right.

For anyone else reading this: An easy way to diagnose a sick machine is to use something like:

adjtimex -c=10 -i=10
                                     --- current ---   -- suggested --
cmos time     system-cmos  error_ppm   tick      freq    tick      freq
1374906355      -0.660995
1374906367      -3.003384  -234238.9  11000         0
1374906379      -5.041906  -203852.2  11000         0   13038   3421012
1374906390      -6.129622  -108771.6  11000         0   12087   4691487
1374906402      -8.383569  -225394.7  11000         0   13253   6206387
1374906415     -11.768661  -338509.2  11000         0   14385    603062
1374906428     -15.120310  -335164.9  11000         0   14351   4253587
1374906440     -17.387367  -226705.7  11000         0   13267    373175
1374906453     -20.810277  -342291.0  11000         0   14422   5963612
1374906465     -23.090249  -227997.2  11000         0   13279   6370600

if those suggested "tick" values on the right are >11000 (ie, drift >6s per minute), then the timer is too broken for chrony to fix.
So when Bill tells you that your machine is very sick, listen to him. :-)

This is a "hardware" issue (in the case of a virtual machine, something more elaborate) that needs to be fixed - in my case, by the hosting service provider.

Also, for the record, although virtual machines do suffer from much more drift problems than physical machines, there is a difference between an "inaccurate" clock (typical of virtual machines), and a "broken" clock (not so typical). Most virtual machines are not "broken" and chrony works just fine.




--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/