Re: [chrony-users] High skew values

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Fri, 26 Jul 2013, Arnon Weinberg wrote:


Thanks Bill for taking the time to reply.

On 2013-07-24 17:53, Bill Unruh wrote:
 have the virtual server get its
 time from that underlying system.

This does sound like a good idea, but does chrony have a feature for doing this? Or any other software?

Not that I know of.

The host clock is available through hwclock, so I've added hwclock -us to cron for now, as that sort of does the job, but the system clock drifts 5-20s per minute so this solution "jumps" the clock a few seconds every minute, instead of slewing it nicely the way chrony does. Not sure if there is a better solution...

IF you can figure out what the "average" drift is, you could use adjtimex to
adjust the system clock's rate to take that out.

The problem is that the drift rate is not constant but fluctuates wildly which
makes this only slightly better.



 I would try to run chrony with one server

I gave this a try as well. It did make the "Can't synchronise: no majority" errors in /var/log/messages go away, but did not improve timekeeping in any way. So I'm not sure if that's a step forward...

I think that my lack of understanding of how this system works is limiting my ability to come up with useful solutions. I've read a lot of documentation over the past few days but still can't seem to understand what is actually happening.

 Why might the skew be so high?

I "think" the answer to my original question is that the virtual machine does not get a consistent slice of CPU time the way a physical machine does. So timer interrupts do not happen at regular intervals, making it impossible for chrony to measure anything at a small scale. This was kind of explained in a small paragraph at the bottom of this document: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

The suggested solution for dealing with this is to add divider=10 to the kernel parameters and reboot. I tried this and it made no difference. So now I'm not sure if my understanding is correct.

 several hours of accumulated drift over a day or so.

What's happening however, is that the system clock falls behind the real time clock or the hardware clock (ie, the host clock, which is well maintained) at a rate of 5-20s per minute (presumably depending on the relative load of the host and guest), resulting in several hours of drift per day.

That is beyond the ability of chrony (or anything) to correct. The max drift
rate that can be compensated is 6 sec/minute. Jumping the clock is your
onlyoption.


What I don't understand is this: chrony logs the following in /var/log/messages:
 chronyd[490]: System clock wrong by 15.124741 seconds, adjustment started
It does this (saying the clock is wrong by about 5-20s) even when the clock is wrong by hours.

I think that this is the "least squares" offset. Ie, it fits the measured
offsets, finds a drift rate and an "offset", sets the system clock using
adjtimex to the "right" rate to compensate, and then uses the rate to try to
compensate for the "offset" as well. But if you are really losing 10 sec per
minute, it cannot.



In my reading on NTP, it seems like a simple enough system: The client asks several servers for the time, does some work to average the results, compensate for network latency and local latency (unreliable on a virtual machine), and slews the system clock accordingly. What I don't understand is, even in this ridiculously high skew situation, how can chrony let the system clock drift by hours? I can certainly understand a few seconds of inaccuracy given the inconsistency of the CPU, but hours?


It sends out a packet with a local time stamp. The remote server, timestamps
the packet when it is received and when it is sent out again, and your machine
timestamps it when it comes back. The measured offset is the difference
betweeen the means of the local timestamps and the remote timestamps. chrony
then takes the last N offsets (compensated for changes it has made in the
drift rate of the clock) and does a least squares fit to find out what the
best estimate is for the drift error and offset error. It also tests to see if
the deviations from the least squares fit look roughly random. If not, it
makes N smaller and tries again until N is 3. In your system N seems to hang
around 3 a lot.

chrony cannot fix drift rates that are larger than 100000 PPM ( that is 1 in
10) (ntpd does not fix drifts larger than 500 PPM-- 1 in 2000)


--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/