[chrony-users] Possible bug in PPS support

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


I noticed a small problem with PPS synchronization...

We have a couple of sites where there is locally distributed PPS from a GPSDO without access to its monitoring.
(another group is monitoring the GPSDO)

So, we use a chrony config with a PPS source and a couple of network time sources to get absolute time.
Config is like this:

refclock        PPS /dev/pps0 refid PPS
server          xx.xxx.72.10    iburst
server          xx.xxx.72.130   iburst
server          xx.xxx.72.131   iburst

(ldattach 18 /dev/ttyS0 is used to provide the /dev./pps0)

This works OK.  After startup chrony initially synchronizes to the network time and after a minute or so it locks
in to the PPS pulses.  The sources output is like this:

210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS                           0   4   377    24   +218ns[ +278ns] +/-  124ns
^- xxxxxx.xxxx.xxx               1  10   377   877   -147us[ -122us] +/-   11ms
^- xxxxxx.xxxx.xxx               1  10   377    14  +1480us[+1480us] +/-   10ms
^- xxx.xxxxxx.xxxx.xxx           1  10   377   345  +1446us[+1447us] +/-   10ms

However, recently at one site the PPS signal was lost, but chrony keeps "locked" to it:

MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS                           0   4     0   13h   -279ns[ -401ns] +/-   79ns
^- xxxxxx.xxxx.xxx               1  10   377   250  +3462us[+3462us] +/-   10ms

As can be seen, it has been lost for 13 hours but it still has the * sign in the 2nd column.
We are remotely monitoring these systems using chronyc tracking and it still indicated stratum 1 referenced to PPS.

I would have expected it to drop back to using those network time servers after some time of not getting pulses
(i.e. once "Reach" is 0) and the stratum to increase to 2.  When it would operate that way, we would have
received an alert.

Furthermore, the clock had drifted by 3.5ms by the time the above status was noticed, while when synchronized
to network time it usually is within 1 to 1.5ms.  So it really is not considering those network time sources anymore.

The above situation occurred with chrony 2.1
However, I have reproduced it with an installation updated to version 3.2 although with an "outage" time of 15 minutes.
It had Reach 0 but still was indicating lock to PPS after 869 seconds.

Is it to be considered a bug, or is this just a design feature?
How could we work around that in this case?

Rob

--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/