Re: [chrony-users] Possible bug in PPS support

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Miroslav Lichvar wrote:
On Mon, Oct 23, 2017 at 10:54:52AM +0200, Rob Janssen wrote:
However, recently at one site the PPS signal was lost, but chrony keeps "locked" to it:

MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS                           0   4     0   13h   -279ns[ -401ns] +/-   79ns
^- xxxxxx.xxxx.xxx               1  10   377   250  +3462us[+3462us] +/-   10ms

As can be seen, it has been lost for 13 hours but it still has the * sign in the 2nd column.
We are remotely monitoring these systems using chronyc tracking and it still indicated stratum 1 referenced to PPS.

I would have expected it to drop back to using those network time servers after some time of not getting pulses
(i.e. once "Reach" is 0) and the stratum to increase to 2.  When it would operate that way, we would have
received an alert.

Furthermore, the clock had drifted by 3.5ms by the time the above status was noticed, while when synchronized
to network time it usually is within 1 to 1.5ms.  So it really is not considering those network time sources anymore.
It would have switched eventually when the estimated error of the
refclock was larger than the error of the NTP source (10
milliseconds).
That does not seem reasonable... should it not refer to the estimated error of the source itself rather
than to the network source?


Have you saved the tracking or sourcestats output? From the skew we
could estimate how long it would take.

Ok here is the tracking.log, the last few lines before it failed:

2017-10-21 22:18:30 PPS              1    -12.275      0.048 -6.697e-07 N  1  4.525e-07  1.504e-07
2017-10-21 22:18:46 PPS              1    -12.279      0.030 -1.661e-07 N  1  3.788e-07  3.638e-11
2017-10-21 22:19:02 PPS              1    -12.284      0.029 -7.386e-07 N  1  4.446e-07  1.177e-07
2017-10-21 22:19:18 PPS              1    -12.286      0.020 -6.956e-08 N  1  3.629e-07  4.908e-11
2017-10-21 22:19:34 PPS              1    -12.290      0.022 -7.190e-07 N  1  4.091e-07  6.094e-08
2017-10-21 22:19:50 PPS              1    -12.292      0.018 -1.540e-07 N  1  3.709e-07  4.822e-11
2017-10-21 22:20:06 PPS              1    -12.295      0.017 -4.841e-07 N  1  4.030e-07  1.114e-07
2017-10-21 22:20:22 PPS              1    -12.297      0.014 -1.363e-07 N  1  3.626e-07  8.935e-09

After this, nothing was logged until I restarted chronyd 13 hours later and it synced to the network sources.


Is it to be considered a bug, or is this just a design feature?
It's a feature, but there is apparently a bug which may make the
switch take much longer than it should.

However, we use this form of time synchronization because we need the clock to be within about 20us
of real time.  When the PPS sync is lost and only network sync is achieved, that is not really attainable.
So we need some indication whenever there is no PPS sync.
Would it not be reasonable to indicate loss of PPS sync when the Reach value becomes zero?
Ok, it could be that freewheeling keeps a more accurate time than syncing to another source, but
at least the error condition should be monitored.


How could we work around that in this case?
Decreasing the maximum number of samples of the NTP source with the
maxsamples option should reduce the maximum span (as reported in
sourcestats) and also the time it will switch from unreachable
sources.

Increasing the maxclockerror would do that too if it was included in
the source selection. Even with the default value it would take only few
hours to switch in your case.


Ok but rather than "only a few hours" I would like to see "only a few minutes".
The Span indicated by sourcestats is 79 for the PPS source now, and 103m for
the network sources.
Would that mean it drops the PPS after 79 seconds?  That would be fine.

Rob

--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/