Re: [chrony-users] Fwd: Chrony malfunctioning at beaglebones |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
>On Fri, Aug 21, 2015 at 07:05:38PM +0100, Nuno Gonçalves wrote:
>> Thank you so much. I've sent you the hostnames of a beaglebone white
>> and a beaglebone black that are set with your public key.
>>
>> I took a few days while I confirmed what I could. In fact I've tried
>> several kernel versions and I see that up to 3.15.10-bone8 the clock
>> is working properly. After 3.16 and until the most recent 4.2 RC, it
>> is malfunctioning.
>>
>> If you have a minute to login at the machines and are able to narrow
>> in any way which part of the kernel clocks is at fault it might be
>> easier have it fixed at the Linux level.
>I did some tests with the adjtimex utility while chronyd was just
>reporting measured offset and not controlling the clock. It looks like
>there is a problem with large frequency offsets. There seems to be a
>maximum offset the kernel is willing to apply to the clock (about 1000
>ppm) and any attempts to slow down or speed up the clock more are
>ignored. This breaks the chrony control loop badly.
>As a workaround, I set the maxslewrate option to 100 ppm and started
>chronyd with "0.0 1.0" in the driftfile to avoid large frequency
>adjustments and it seems it was able to settle down.
>I'm not sure where the problem could be. Looking at the log for the
>linux/kernel/time directory between 3.15 and 3.16 I don't see anything
>suspicious. Could be something in the arch-specific code.
>You could try to add the nohz option to the kernel command line, that
>would suggest it's a problem with the internal timekeeping loop.
>I think this is a serious bug that needs to be fixed. If you file any
>bug reports, please let us know.
>Thanks,
>--
>Miroslav Lichvar
I wanted to confirm that the kernel timekeeping issue that appeared in:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/time/timekeeping.c?id=dc491596f6394382fbc74ad331156207d619fa0a
causes problems with chronyd that are by fixed by:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/time/timekeeping.c?id=2619d7e9c92d524cb155ec89fd72875321512e5b
In my case, I found that the changes in dc49159 can lead to chronyd trying to adjust the clock frequency by thousands of ppm with each adjustment. Loading the cpu heavily is one way to trip it into instability. Based on Nuno Gonçalves' bug reports (here and https://lkml.org/lkml/2015/9/1/488) I tried rebuilding with patch 2619d7e9c and found it resolves the issue.
Logs below are from chronyd 2.1.1 with these refclocks:
refclock SHM 0 refid GPSD offset 0.0 poll 0 delay 2e-1 precision 1e-1
refclock PPS /dev/pps0 refid KPPS offset 0.0 poll 0 delay 1e-6 precision 1e-9 lock GPSD
This is on a custom TI AM335x-based machine. Free-running PPS jitter (using pps_gpio.ko) is around 5 usec. Under linux 3.14 chronyd is able to converge to a good estimate of the master clock error (~23 ppm) quite quickly, but under kernels 3.16 to 4.2 the adjustments are erratic and I could not get good synchronization. Under 4.3.0-rc5 all is well.
Thanks to Nuno and John for finding and fixing this bug!
-Rob Calhoun
pre-fix chronyd tracking log, linux 4.2 (807249d3ada1ff28a47c4054ca4edd479421b671):
/* with an idle system things are ok */
2015-10-15 21:31:28 KPPS 1 23.504 0.190 2.202e-08 N 1 6.158e-08 -1.736e-03
2015-10-15 21:31:30 KPPS 1 23.603 0.421 1.439e-07 N 1 1.352e-07 -1.136e-03
2015-10-15 21:31:32 KPPS 1 23.599 0.632 1.220e-06 N 1 9.360e-07 -5.326e-04
2015-10-15 21:31:34 KPPS 1 23.567 1.182 1.977e-04 N 1 2.163e-05 3.821e-07
/* Now load the cpu with "grep 'foo.*bar' /usr" */
2015-10-15 21:31:44 KPPS 1 23.490 4.065 -4.404e-04 N 1 1.526e-05 -8.459e-08
2015-10-15 21:31:46 KPPS 1 11.149 51.733 -2.140e-04 N 1 1.987e-05 -3.187e-07
2015-10-15 21:31:48 KPPS 1 -43.087 108.193 2.882e-04 N 1 4.212e-05 -3.125e-07
2015-10-15 21:31:50 KPPS 1 1.904 267.907 3.426e-04 N 1 6.443e-06 -2.309e-04
2015-10-15 21:31:52 KPPS 1 311.318 586.233 2.880e-04 N 1 1.413e-05 -4.266e-04
2015-10-15 21:31:54 KPPS 1 376.514 192.213 1.170e-04 N 1 3.342e-05 -1.916e-04
2015-10-15 21:31:56 KPPS 1 427.136 82.242 -2.364e-05 N 1 4.731e-06 1.849e-07
2015-10-15 21:31:58 KPPS 1 427.136 82.242 -5.466e-04 N 1 3.370e-05 3.626e-04
2015-10-15 21:32:00 KPPS 1 400.528 227.446 -5.693e-04 N 1 2.757e-05 9.942e-04
2015-10-15 21:32:02 KPPS 1 -71.250 171.502 -1.038e-04 N 1 3.971e-05 1.375e-03
2015-10-15 21:32:04 KPPS 1 -93.655 122.819 -2.904e-04 N 1 7.255e-05 9.777e-04
/* once destabilized it goes into oscillation; sometimes it can recover, sometimes not */
2015-10-15 21:41:14 KPPS 1 -8422.943 1627.276 9.956e-04 N 1 2.376e-04 8.855e-03
2015-10-15 21:41:16 KPPS 1 -8422.943 1627.276 3.159e-03 N 1 4.497e-04 4.782e-03
2015-10-15 21:41:18 KPPS 1 -6221.196 1645.369 1.658e-03 N 1 8.883e-05 -1.430e-03
2015-10-15 21:41:20 KPPS 1 -6221.196 1645.369 3.980e-03 N 1 2.327e-04 -6.818e-03
2015-10-15 21:41:22 KPPS 1 -3144.730 1905.826 1.435e-03 N 1 3.720e-04 -1.373e-02
2015-10-15 21:41:24 KPPS 1 -3144.730 1905.826 4.559e-03 N 1 6.280e-04 -1.828e-02
2015-10-15 21:41:26 KPPS 1 11.262 1805.031 9.407e-04 N 1 2.944e-04 -2.630e-02
2015-10-15 21:41:28 KPPS 1 11.262 1805.031 4.147e-03 N 1 6.300e-04 -2.987e-02
2015-10-15 21:41:30 KPPS 1 2797.796 1756.170 8.390e-04 N 1 2.586e-04 -3.712e-02
2015-10-15 21:41:32 KPPS 1 2797.796 1756.170 3.642e-03 N 1 5.480e-04 -4.020e-02
2015-10-15 21:41:34 KPPS 1 5235.517 1659.567 7.185e-04 N 1 2.229e-04 -4.648e-02
2015-10-15 21:41:36 KPPS 1 5235.517 1659.567 3.148e-03 N 1 4.731e-04 -4.906e-02
2015-10-15 21:41:38 KPPS 1 7313.746 1596.585 6.046e-04 N 1 1.848e-04 -5.439e-02
2015-10-15 21:41:40 KPPS 1 7313.746 1596.585 2.646e-03 N 1 3.949e-04 -5.647e-02
2015-10-15 21:41:42 KPPS 1 9072.027 1400.769 4.501e-04 N 1 1.426e-04 -6.085e-02
2015-10-15 21:41:44 KPPS 1 9072.027 1400.769 2.173e-03 N 1 3.367e-04 -6.239e-02
2015-10-15 21:41:46 KPPS 1 10416.160 1533.953 3.754e-04 N 1 1.027e-04 -6.584e-02
2015-10-15 21:41:48 KPPS 1 10416.160 1533.953 1.640e-03 N 1 2.393e-04 -6.692e-02
2015-10-15 21:41:50 KPPS 1 11616.034 334.487 2.367e-04 N 1 5.659e-05 -6.935e-02
2015-10-15 21:41:52 KPPS 1 11616.034 334.487 1.152e-03 N 1 2.129e-04 -6.976e-02
post-fix chronyd tracking log, 4.3.0-rc5 (cfed1e3de4764fb5b6a02c93ec246424b3bffd21)
2015-10-16 19:17:26 KPPS 1 23.714 0.034 1.438e-07 N 1 2.181e-07 7.583e-08
2015-10-16 19:17:28 KPPS 1 23.713 0.034 -1.158e-07 N 1 2.176e-07 -7.115e-08
2015-10-16 19:17:30 KPPS 1 23.715 0.033 4.135e-08 N 1 2.129e-07 -7.445e-09
/* Now load the cpu with "grep 'foo.*bar' /usr" */
2015-10-16 19:17:32 KPPS 1 23.738 0.123 2.741e-06 N 1 8.290e-07 7.556e-10
2015-10-16 19:17:34 KPPS 1 24.135 0.552 5.582e-07 N 1 7.038e-07 5.230e-10
2015-10-16 19:17:36 KPPS 1 24.220 0.373 -5.479e-07 N 1 5.595e-07 -1.842e-07
2015-10-16 19:17:38 KPPS 1 24.187 0.371 -1.996e-07 N 1 5.293e-07 8.346e-08
2015-10-16 19:17:40 KPPS 1 24.006 0.501 -8.763e-07 N 1 6.734e-07 -1.937e-09
2015-10-16 19:17:42 KPPS 1 23.886 0.412 -2.199e-07 N 1 5.646e-07 2.005e-07
2015-10-16 19:17:44 KPPS 1 23.830 0.318 1.618e-07 N 1 4.240e-07 1.864e-07
2015-10-16 19:17:46 KPPS 1 23.739 0.228 -3.947e-08 N 1 2.882e-07 5.484e-08
2015-10-16 19:17:48 KPPS 1 23.683 0.211 -1.628e-07 N 1 2.813e-07 4.978e-09
2015-10-16 19:17:50 KPPS 1 23.644 0.192 2.151e-08 N 1 2.760e-07 -8.058e-10
2015-10-16 19:17:52 KPPS 1 23.609 0.181 -2.145e-07 N 1 2.322e-07 6.498e-08
N������y隊W!���������n���\��"������z)�.n7��Z+��f����|�������'��}���*+�����)�.n7��:蹹^f��X��f����'��}���*+