[chrony-users] Chronyd has good sources but rejects time "may be in error" and "Can't synchronise: no majority" |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
I have a pesky Raspberry Pi with GPS doing GPS+PPS for stratum 1 time
keeping using chronyd. It's set up identically to a second Pi doing the
same thing at a second location.
Here's a tidbit from my problem Pi's chrony.conf:
---------------------------
## GPS NMEA reference:
refclock SHM 0 offset 0.470 refid NMEA
## GPS PPS reference:
refclock SHM 1 refid GPS lock NMEA
## 10.0.0.10 (Raspberry Pi GPS+PPS)
## NOTE: This device is IDENTICAL to problem Pi but doesn't
## seem to exhibit the same problem as often.
peer 10.0.0.10
allow 10.0.0.10
## 10.50.0.10 (FreeBSD Garmin 18x GPS+PPS)
peer 10.50.0.10
allow 10.50.0.10
---------------------------
THE PROBLEM: Periodically chronyd seems to stop listening to the GPS's
date/time and PPS information, and even stops listening to peer NTP servers.
When I learn of this problem, I login to the problem Pi and look at
things thus:
----------
# chronyc tracking && chronyc -n sources
Reference ID : 00000000 ()
Stratum : 0
Ref time (UTC) : Thu Jan 01 00:00:00 1970
System time : 0.000000019 seconds fast of NTP time
Last offset : -0.000000070 seconds
RMS offset : 0.000000508 seconds
Frequency : 18.205 ppm fast
Residual freq : +0.000 ppm
Skew : 0.000 ppm
Root delay : 1.000000000 seconds
Root dispersion : 1.000000000 seconds
Update interval : 16.0 seconds
Leap status : Not synchronised
210 Number of sources = 4
.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not
combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too
variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted
offset,
|| Log2(Polling interval) --. | | yyyy = measured
offset,
|| \ | | zzzz = estimated
error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#x NMEA 0 4 377 16 +76ms[ +76ms] +/-
6673us
#x GPS 0 4 377 17 -23us[ -23us] +/-
444ns
=x 10.0.0.10 1 9 377 638 -33us[ -33us] +/-
1016us
=x 10.50.0.10 1 10 377 1107 +407us[ +407us] +/- 1441us
----------
Notice that EVERY time source is showing 'x' (time may be in error).
What? I login to 10.0.0.10, the identical Pi at a second location, and
it is fine--and shows source 10.50.0.10 as a good time source too.
So I wonder, is the GPS not locked? I use a script to query gpsd (via
TCP to 127.0.0.1 port 2947 where gpsd is listening) to find out the
status of the GPS device:
----------
{
"active": 1,
"time": "2020-05-26T15:54:12.537Z",
"sats": 12,
"sats_used": 11,
"tpv_status": null,
"tpv_mode": 3,
"tpv_time": "2020-05-26T15:54:11.000Z",
"lat": XXX.XXXXXX,
"lon": YYY.YYYYYY,
"alt": ZZZZ.Z,
"ept": 0.005,
"epx": 8.227,
"epy": 7.806,
"epv": 27.83
}
----------
Looks like the GPS is active, has time and date, has a full 3-D lock
(tpv_mode=3), can see and use information from 11 satellites. And the
GPS estimates time error at 5 ms. (ept=0.005). So far as I can tell,
the GPS is functioning as expected, and should be delivering time and
a PPS signal to chronyd just fine.
So what's going on with chronyd? Why is it rejecting all time sources?
What can I enable to find out more?
I've quickly glanced at various logs
(measurements/tracking/refclocks/statistics) and don't immediately
notice any numbers outside the usual range seen before and after the
problem manifested.
The only indicator was one log entry in syslog:
----------
May 26 15:31:16 myhost chronyd[435]: Can't synchronise: no majority
----------
I gave up and simply rebooted the system a bit over 25 minutes later,
and chronyd resumed with these syslog entries:
----------
May 26 15:57:00 myhost systemd[1]: Starting chrony, an NTP client/server...
May 26 15:57:00 myhost chronyd[426]: chronyd version 3.4 starting
(+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS
+SECHASH +IPV6 -DEBUG)
May 26 15:57:00 myhost chronyd[426]: Could not open IPv6 command socket
: Address family not supported by protocol
May 26 15:57:00 myhost chronyd[426]: Frequency 18.205 +/- 0.021 ppm read
from /var/lib/chrony/chrony.drift
May 26 15:57:00 myhost systemd[1]: Started chrony, an NTP client/server.
May 26 15:57:47 myhost chronyd[426]: Selected source GPS
May 26 15:57:47 myhost chronyd[426]: System clock wrong by 13.062845
seconds, adjustment started
May 26 15:58:00 myhost chronyd[426]: System clock was stepped by
13.062845 seconds
May 26 15:58:48 myhost chronyd[426]: Can't synchronise: no majority
May 26 16:01:11 myhost chronyd[426]: Selected source GPS
----------
And after the reboot and once chronyd was synchronized--roughly 45
minutes after it originally lost synch--I took another look at things to
see if I could notice any odd differences in comparison to what I saw
while chronyd on the Pi was exhibiting the problem:
----------
# chronyc tracking && chronyc -n sources -v
Reference ID : 47505300 (GPS)
Stratum : 1
Ref time (UTC) : Tue May 26 16:16:17 2020
System time : 0.000000045 seconds slow of NTP time
Last offset : +0.000000082 seconds
RMS offset : 0.000000578 seconds
Frequency : 18.105 ppm fast
Residual freq : -0.004 ppm
Skew : 0.060 ppm
Root delay : 0.000000001 seconds
Root dispersion : 0.000022815 seconds
Update interval : 16.0 seconds
Leap status : Normal
210 Number of sources = 4
.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not
combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too
variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted
offset,
|| Log2(Polling interval) --. | | yyyy = measured
offset,
|| \ | | zzzz = estimated
error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#x NMEA 0 4 377 20 +65ms[ +65ms] +/-
6057us
#* GPS 0 4 377 20 +278ns[ +440ns] +/-
444ns
=- 10.0.0.10 1 6 377 41 -19us[ -18us] +/-
1014us
=- 10.50.0.10 1 6 377 35 -165us[ -165us] +/- 1327us
----------
It looks like chronyd is happy with the packet-per-second (PPS) output
from the GPS (the "GPS" source above), though not happy with the NMEA
GPS time/date information. It appears to be accepting data from peers
even if the data is "not combined."
----------
# ./get_gps_status
{
"active": 1,
"time": "2020-05-26T16:17:27.680Z",
"sats": 11,
"sats_used": 8,
"tpv_status": null,
"tpv_mode": 3,
"tpv_time": "2020-05-26T16:17:27.000Z",
"lat": XXX.XXXXXX,
"lon": YYY.YYYYYY,
"alt": ZZZZ.Z,
"ept": 0.005,
"epx": 9.598,
"epy": 16.407,
"epv": 20.24
}
----------
Things look very similar, only there are 8 of 11 satellites used instead
of 11 of 12. Time error ept=0.005 whih is the same. And tpv_mode=3
just as before, indicating a full 3-D satellite position lock.
This is quite puzzling. I cannot see a reason chronyd should have ever
began rejecting the time data it was receiving.
Thanks for any/all insights!
Aaron Gifford
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.