Re: [chrony-users] Fwd: Chrony malfunctioning at beaglebones

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


>On Fri, Aug 21, 2015 at 07:05:38PM +0100, Nuno Gonçalves wrote:
>> Thank you so much. I've sent you the hostnames of a beaglebone white
>> and a beaglebone black that are set with your public key.
>> 
>> I took a few days while I confirmed what I could. In fact I've tried
>> several kernel versions and I see that up to 3.15.10-bone8 the clock
>> is working properly. After 3.16 and until the most recent 4.2 RC, it
>> is malfunctioning.
>> 
>> If you have a minute to login at the machines and are able to narrow
>> in any way which part of the kernel clocks is at fault it might be
>> easier have it fixed at the Linux level.

>I did some tests with the adjtimex utility while chronyd was just
>reporting measured offset and not controlling the clock. It looks like
>there is a problem with large frequency offsets. There seems to be a
>maximum offset the kernel is willing to apply to the clock (about 1000
>ppm) and any attempts to slow down or speed up the clock more are
>ignored. This breaks the chrony control loop badly.

>As a workaround, I set the maxslewrate option to 100 ppm and started
>chronyd with "0.0 1.0" in the driftfile to avoid large frequency
>adjustments and it seems it was able to settle down.

>I'm not sure where the problem could be. Looking at the log for the
>linux/kernel/time directory between 3.15 and 3.16 I don't see anything
>suspicious. Could be something in the arch-specific code.

>You could try to add the nohz option to the kernel command line, that
>would suggest it's a problem with the internal timekeeping loop.

>I think this is a serious bug that needs to be fixed. If you file any
>bug reports, please let us know.

>Thanks,

>-- 
>Miroslav Lichvar


I wanted to confirm that the kernel timekeeping issue that appeared in:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/time/timekeeping.c?id=dc491596f6394382fbc74ad331156207d619fa0a

causes problems with chronyd that are by fixed by:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/time/timekeeping.c?id=2619d7e9c92d524cb155ec89fd72875321512e5b


In my case, I found that the changes in dc49159 can lead to chronyd trying to adjust the clock frequency by thousands of ppm with each adjustment. Loading the cpu heavily is one way to trip it into instability. Based on Nuno Gonçalves' bug reports (here and https://lkml.org/lkml/2015/9/1/488) I tried rebuilding with patch 2619d7e9c and found it resolves the issue.

Logs below are from chronyd 2.1.1 with these refclocks:

refclock SHM 0          refid GPSD offset 0.0 poll 0 delay 2e-1 precision 1e-1
refclock PPS /dev/pps0  refid KPPS offset 0.0 poll 0 delay 1e-6 precision 1e-9 lock GPSD


This is on a custom TI AM335x-based machine. Free-running PPS jitter (using pps_gpio.ko) is around 5 usec. Under linux 3.14 chronyd is able to converge to a good estimate of the master clock error (~23 ppm) quite quickly, but under kernels 3.16 to 4.2 the adjustments are erratic and I could not get good synchronization. Under 4.3.0-rc5 all is well.

Thanks to Nuno and John for finding and fixing this bug!


-Rob Calhoun


pre-fix chronyd tracking log, linux 4.2 (807249d3ada1ff28a47c4054ca4edd479421b671):
/* with an idle system things are ok */
2015-10-15 21:31:28 KPPS             1     23.504      0.190  2.202e-08 N  1  6.158e-08 -1.736e-03
2015-10-15 21:31:30 KPPS             1     23.603      0.421  1.439e-07 N  1  1.352e-07 -1.136e-03
2015-10-15 21:31:32 KPPS             1     23.599      0.632  1.220e-06 N  1  9.360e-07 -5.326e-04
2015-10-15 21:31:34 KPPS             1     23.567      1.182  1.977e-04 N  1  2.163e-05  3.821e-07
/* Now load the cpu with "grep 'foo.*bar' /usr" */
2015-10-15 21:31:44 KPPS             1     23.490      4.065 -4.404e-04 N  1  1.526e-05 -8.459e-08
2015-10-15 21:31:46 KPPS             1     11.149     51.733 -2.140e-04 N  1  1.987e-05 -3.187e-07
2015-10-15 21:31:48 KPPS             1    -43.087    108.193  2.882e-04 N  1  4.212e-05 -3.125e-07
2015-10-15 21:31:50 KPPS             1      1.904    267.907  3.426e-04 N  1  6.443e-06 -2.309e-04
2015-10-15 21:31:52 KPPS             1    311.318    586.233  2.880e-04 N  1  1.413e-05 -4.266e-04
2015-10-15 21:31:54 KPPS             1    376.514    192.213  1.170e-04 N  1  3.342e-05 -1.916e-04
2015-10-15 21:31:56 KPPS             1    427.136     82.242 -2.364e-05 N  1  4.731e-06  1.849e-07
2015-10-15 21:31:58 KPPS             1    427.136     82.242 -5.466e-04 N  1  3.370e-05  3.626e-04
2015-10-15 21:32:00 KPPS             1    400.528    227.446 -5.693e-04 N  1  2.757e-05  9.942e-04
2015-10-15 21:32:02 KPPS             1    -71.250    171.502 -1.038e-04 N  1  3.971e-05  1.375e-03
2015-10-15 21:32:04 KPPS             1    -93.655    122.819 -2.904e-04 N  1  7.255e-05  9.777e-04
/* once destabilized it goes into oscillation; sometimes it can recover, sometimes not */
2015-10-15 21:41:14 KPPS             1  -8422.943   1627.276  9.956e-04 N  1  2.376e-04  8.855e-03
2015-10-15 21:41:16 KPPS             1  -8422.943   1627.276  3.159e-03 N  1  4.497e-04  4.782e-03
2015-10-15 21:41:18 KPPS             1  -6221.196   1645.369  1.658e-03 N  1  8.883e-05 -1.430e-03
2015-10-15 21:41:20 KPPS             1  -6221.196   1645.369  3.980e-03 N  1  2.327e-04 -6.818e-03
2015-10-15 21:41:22 KPPS             1  -3144.730   1905.826  1.435e-03 N  1  3.720e-04 -1.373e-02
2015-10-15 21:41:24 KPPS             1  -3144.730   1905.826  4.559e-03 N  1  6.280e-04 -1.828e-02
2015-10-15 21:41:26 KPPS             1     11.262   1805.031  9.407e-04 N  1  2.944e-04 -2.630e-02
2015-10-15 21:41:28 KPPS             1     11.262   1805.031  4.147e-03 N  1  6.300e-04 -2.987e-02
2015-10-15 21:41:30 KPPS             1   2797.796   1756.170  8.390e-04 N  1  2.586e-04 -3.712e-02
2015-10-15 21:41:32 KPPS             1   2797.796   1756.170  3.642e-03 N  1  5.480e-04 -4.020e-02
2015-10-15 21:41:34 KPPS             1   5235.517   1659.567  7.185e-04 N  1  2.229e-04 -4.648e-02
2015-10-15 21:41:36 KPPS             1   5235.517   1659.567  3.148e-03 N  1  4.731e-04 -4.906e-02
2015-10-15 21:41:38 KPPS             1   7313.746   1596.585  6.046e-04 N  1  1.848e-04 -5.439e-02
2015-10-15 21:41:40 KPPS             1   7313.746   1596.585  2.646e-03 N  1  3.949e-04 -5.647e-02
2015-10-15 21:41:42 KPPS             1   9072.027   1400.769  4.501e-04 N  1  1.426e-04 -6.085e-02
2015-10-15 21:41:44 KPPS             1   9072.027   1400.769  2.173e-03 N  1  3.367e-04 -6.239e-02
2015-10-15 21:41:46 KPPS             1  10416.160   1533.953  3.754e-04 N  1  1.027e-04 -6.584e-02
2015-10-15 21:41:48 KPPS             1  10416.160   1533.953  1.640e-03 N  1  2.393e-04 -6.692e-02
2015-10-15 21:41:50 KPPS             1  11616.034    334.487  2.367e-04 N  1  5.659e-05 -6.935e-02
2015-10-15 21:41:52 KPPS             1  11616.034    334.487  1.152e-03 N  1  2.129e-04 -6.976e-02



post-fix chronyd tracking log, 4.3.0-rc5 (cfed1e3de4764fb5b6a02c93ec246424b3bffd21)
2015-10-16 19:17:26 KPPS             1     23.714      0.034  1.438e-07 N  1  2.181e-07  7.583e-08
2015-10-16 19:17:28 KPPS             1     23.713      0.034 -1.158e-07 N  1  2.176e-07 -7.115e-08
2015-10-16 19:17:30 KPPS             1     23.715      0.033  4.135e-08 N  1  2.129e-07 -7.445e-09
/* Now load the cpu with "grep 'foo.*bar' /usr" */
2015-10-16 19:17:32 KPPS             1     23.738      0.123  2.741e-06 N  1  8.290e-07  7.556e-10
2015-10-16 19:17:34 KPPS             1     24.135      0.552  5.582e-07 N  1  7.038e-07  5.230e-10
2015-10-16 19:17:36 KPPS             1     24.220      0.373 -5.479e-07 N  1  5.595e-07 -1.842e-07
2015-10-16 19:17:38 KPPS             1     24.187      0.371 -1.996e-07 N  1  5.293e-07  8.346e-08
2015-10-16 19:17:40 KPPS             1     24.006      0.501 -8.763e-07 N  1  6.734e-07 -1.937e-09
2015-10-16 19:17:42 KPPS             1     23.886      0.412 -2.199e-07 N  1  5.646e-07  2.005e-07
2015-10-16 19:17:44 KPPS             1     23.830      0.318  1.618e-07 N  1  4.240e-07  1.864e-07
2015-10-16 19:17:46 KPPS             1     23.739      0.228 -3.947e-08 N  1  2.882e-07  5.484e-08
2015-10-16 19:17:48 KPPS             1     23.683      0.211 -1.628e-07 N  1  2.813e-07  4.978e-09
2015-10-16 19:17:50 KPPS             1     23.644      0.192  2.151e-08 N  1  2.760e-07 -8.058e-10
2015-10-16 19:17:52 KPPS             1     23.609      0.181 -2.145e-07 N  1  2.322e-07  6.498e-08




N������y隊W!���������n���\��"������z)�.n7��Z+��f����|�������'��}���*+�����)�.n7��:蹹^f��X��f����'��}���*+


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/