Re: [chrony-users] Possible bug in PPS support

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Mon, Oct 23, 2017 at 06:06:17PM +0200, Rob Janssen wrote:
> Miroslav Lichvar wrote:
> > On Mon, Oct 23, 2017 at 10:54:52AM +0200, Rob Janssen wrote:
> > > Furthermore, the clock had drifted by 3.5ms by the time the above status was noticed, while when synchronized
> > > to network time it usually is within 1 to 1.5ms.  So it really is not considering those network time sources anymore.
> > It would have switched eventually when the estimated error of the
> > refclock was larger than the error of the NTP source (10
> > milliseconds).
> That does not seem reasonable... should it not refer to the estimated error of the source itself rather
> than to the network source?

I'm not sure what you mean here.

> > Have you saved the tracking or sourcestats output? From the skew we
> > could estimate how long it would take.
> 
> Ok here is the tracking.log, the last few lines before it failed:
> 
> 2017-10-21 22:18:30 PPS              1    -12.275      0.048 -6.697e-07 N  1  4.525e-07  1.504e-07
> 2017-10-21 22:18:46 PPS              1    -12.279      0.030 -1.661e-07 N  1  3.788e-07  3.638e-11
> 2017-10-21 22:19:02 PPS              1    -12.284      0.029 -7.386e-07 N  1  4.446e-07  1.177e-07
> 2017-10-21 22:19:18 PPS              1    -12.286      0.020 -6.956e-08 N  1  3.629e-07  4.908e-11
> 2017-10-21 22:19:34 PPS              1    -12.290      0.022 -7.190e-07 N  1  4.091e-07  6.094e-08
> 2017-10-21 22:19:50 PPS              1    -12.292      0.018 -1.540e-07 N  1  3.709e-07  4.822e-11
> 2017-10-21 22:20:06 PPS              1    -12.295      0.017 -4.841e-07 N  1  4.030e-07  1.114e-07
> 2017-10-21 22:20:22 PPS              1    -12.297      0.014 -1.363e-07 N  1  3.626e-07  8.935e-09
> 
> After this, nothing was logged until I restarted chronyd 13 hours later and it synced to the network sources.

The last skew was 14 ppb, so it would take about 8 days to accumulate
10 milliseconds worth of dispersion. The other check comparing the age
of samples between sources would kick in sooner (64 * 1024 seconds =
~18 hours).

> > > Is it to be considered a bug, or is this just a design feature?
> > It's a feature, but there is apparently a bug which may make the
> > switch take much longer than it should.
> 
> However, we use this form of time synchronization because we need the clock to be within about 20us
> of real time.  When the PPS sync is lost and only network sync is achieved, that is not really attainable.
> So we need some indication whenever there is no PPS sync.
> Would it not be reasonable to indicate loss of PPS sync when the Reach value becomes zero?
> Ok, it could be that freewheeling keeps a more accurate time than syncing to another source, but
> at least the error condition should be monitored.

It's not an error condition in chrony, as it was designed for
intermittent "connection". Refclocks are handled in the same way as
NTP sources.

I think the best approach for checking the accuracy of the clock is to
monitor the root delay+dispersion. That's the estimated maximum error
of the clock. If you really wanted to make sure an update of the clock
was made in the last X seconds, you can check the reference time.

> Ok but rather than "only a few hours" I would like to see "only a few minutes".
> The Span indicated by sourcestats is 79 for the PPS source now, and 103m for
> the network sources.
> Would that mean it drops the PPS after 79 seconds?  That would be fine.

No, that would be 103 minutes if the span didn't change in that time.

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/