Re: [chrony-users] Possible bug in PPS support |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
- To: chrony-users@xxxxxxxxxxxxxxxxxxxx
- Subject: Re: [chrony-users] Possible bug in PPS support
- From: Miroslav Lichvar <mlichvar@xxxxxxxxxx>
- Date: Mon, 23 Oct 2017 11:39:03 +0200
- Authentication-results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mlichvar@xxxxxxxxxx
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 898BEC057FAB
On Mon, Oct 23, 2017 at 10:54:52AM +0200, Rob Janssen wrote:
> However, recently at one site the PPS signal was lost, but chrony keeps "locked" to it:
>
> MS Name/IP address Stratum Poll Reach LastRx Last sample
> ===============================================================================
> #* PPS 0 4 0 13h -279ns[ -401ns] +/- 79ns
> ^- xxxxxx.xxxx.xxx 1 10 377 250 +3462us[+3462us] +/- 10ms
>
> As can be seen, it has been lost for 13 hours but it still has the * sign in the 2nd column.
> We are remotely monitoring these systems using chronyc tracking and it still indicated stratum 1 referenced to PPS.
>
> I would have expected it to drop back to using those network time servers after some time of not getting pulses
> (i.e. once "Reach" is 0) and the stratum to increase to 2. When it would operate that way, we would have
> received an alert.
>
> Furthermore, the clock had drifted by 3.5ms by the time the above status was noticed, while when synchronized
> to network time it usually is within 1 to 1.5ms. So it really is not considering those network time sources anymore.
It would have switched eventually when the estimated error of the
refclock was larger than the error of the NTP source (10
milliseconds).
Have you saved the tracking or sourcestats output? From the skew we
could estimate how long it would take.
> Is it to be considered a bug, or is this just a design feature?
It's a feature, but there is apparently a bug which may make the
switch take much longer than it should.
> How could we work around that in this case?
Decreasing the maximum number of samples of the NTP source with the
maxsamples option should reduce the maximum span (as reported in
sourcestats) and also the time it will switch from unreachable
sources.
Increasing the maxclockerror would do that too if it was included in
the source selection. Even with the default value it would take only few
hours to switch in your case.
I thought it was included when I responded couple days ago to a
similar question on this list. I just checked and it's not included.
I'll look into that.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.