[chrony-users] Chronyd has good sources but rejects time "may be in error" and "Can't synchronise: no majority"

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


I have a pesky Raspberry Pi with GPS doing GPS+PPS for stratum 1 time keeping using chronyd. It's set up identically to a second Pi doing the same thing at a second location.

Here's a tidbit from my problem Pi's chrony.conf:
---------------------------
## GPS NMEA reference:
refclock SHM 0 offset 0.470 refid NMEA

## GPS PPS reference:
refclock SHM 1 refid GPS lock NMEA

## 10.0.0.10 (Raspberry Pi GPS+PPS)
## NOTE: This device is IDENTICAL to problem Pi but doesn't
##       seem to exhibit the same problem as often.
peer 10.0.0.10
allow 10.0.0.10

## 10.50.0.10 (FreeBSD Garmin 18x GPS+PPS)
peer 10.50.0.10
allow 10.50.0.10
---------------------------


THE PROBLEM: Periodically chronyd seems to stop listening to the GPS's date/time and PPS information, and even stops listening to peer NTP servers.

When I learn of this problem, I login to the problem Pi and look at things thus:
----------
# chronyc tracking && chronyc -n sources
Reference ID    : 00000000 ()
Stratum         : 0
Ref time (UTC)  : Thu Jan 01 00:00:00 1970
System time     : 0.000000019 seconds fast of NTP time
Last offset     : -0.000000070 seconds
RMS offset      : 0.000000508 seconds
Frequency       : 18.205 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 1.000000000 seconds
Root dispersion : 1.000000000 seconds
Update interval : 16.0 seconds
Leap status     : Not synchronised
210 Number of sources = 4

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined, | / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset, || Log2(Polling interval) --. | | yyyy = measured offset, || \ | | zzzz = estimated error.
||                                 |    |           \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#x NMEA 0 4 377 16 +76ms[ +76ms] +/- 6673us #x GPS 0 4 377 17 -23us[ -23us] +/- 444ns =x 10.0.0.10 1 9 377 638 -33us[ -33us] +/- 1016us
=x 10.50.0.10                1  10   377  1107   +407us[ +407us] +/- 1441us
----------

Notice that EVERY time source is showing 'x' (time may be in error). What? I login to 10.0.0.10, the identical Pi at a second location, and it is fine--and shows source 10.50.0.10 as a good time source too.

So I wonder, is the GPS not locked? I use a script to query gpsd (via TCP to 127.0.0.1 port 2947 where gpsd is listening) to find out the status of the GPS device:
----------
    {
    "active": 1,
    "time": "2020-05-26T15:54:12.537Z",
    "sats": 12,
    "sats_used": 11,
    "tpv_status": null,
    "tpv_mode": 3,
    "tpv_time": "2020-05-26T15:54:11.000Z",
    "lat": XXX.XXXXXX,
    "lon": YYY.YYYYYY,
    "alt": ZZZZ.Z,
    "ept": 0.005,
    "epx": 8.227,
    "epy": 7.806,
    "epv": 27.83
  }
----------

Looks like the GPS is active, has time and date, has a full 3-D lock (tpv_mode=3), can see and use information from 11 satellites. And the GPS estimates time error at 5 ms. (ept=0.005). So far as I can tell,
the GPS is functioning as expected, and should be delivering time and
a PPS signal to chronyd just fine.

So what's going on with chronyd? Why is it rejecting all time sources? What can I enable to find out more?

I've quickly glanced at various logs (measurements/tracking/refclocks/statistics) and don't immediately notice any numbers outside the usual range seen before and after the problem manifested.

The only indicator was one log entry in syslog:
----------
May 26 15:31:16 myhost chronyd[435]: Can't synchronise: no majority
----------

I gave up and simply rebooted the system a bit over 25 minutes later, and chronyd resumed with these syslog entries:
----------
May 26 15:57:00 myhost systemd[1]: Starting chrony, an NTP client/server...
May 26 15:57:00 myhost chronyd[426]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 -DEBUG) May 26 15:57:00 myhost chronyd[426]: Could not open IPv6 command socket : Address family not supported by protocol May 26 15:57:00 myhost chronyd[426]: Frequency 18.205 +/- 0.021 ppm read from /var/lib/chrony/chrony.drift
May 26 15:57:00 myhost systemd[1]: Started chrony, an NTP client/server.
May 26 15:57:47 myhost chronyd[426]: Selected source GPS
May 26 15:57:47 myhost chronyd[426]: System clock wrong by 13.062845 seconds, adjustment started May 26 15:58:00 myhost chronyd[426]: System clock was stepped by 13.062845 seconds
May 26 15:58:48 myhost chronyd[426]: Can't synchronise: no majority
May 26 16:01:11 myhost chronyd[426]: Selected source GPS
----------


And after the reboot and once chronyd was synchronized--roughly 45 minutes after it originally lost synch--I took another look at things to see if I could notice any odd differences in comparison to what I saw while chronyd on the Pi was exhibiting the problem:

----------
# chronyc tracking && chronyc -n sources -v
Reference ID    : 47505300 (GPS)
Stratum         : 1
Ref time (UTC)  : Tue May 26 16:16:17 2020
System time     : 0.000000045 seconds slow of NTP time
Last offset     : +0.000000082 seconds
RMS offset      : 0.000000578 seconds
Frequency       : 18.105 ppm fast
Residual freq   : -0.004 ppm
Skew            : 0.060 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000022815 seconds
Update interval : 16.0 seconds
Leap status     : Normal
210 Number of sources = 4

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined, | / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset, || Log2(Polling interval) --. | | yyyy = measured offset, || \ | | zzzz = estimated error.
||                                 |    |           \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#x NMEA 0 4 377 20 +65ms[ +65ms] +/- 6057us #* GPS 0 4 377 20 +278ns[ +440ns] +/- 444ns =- 10.0.0.10 1 6 377 41 -19us[ -18us] +/- 1014us
=- 10.50.0.10                1   6   377    35   -165us[ -165us] +/- 1327us
----------

It looks like chronyd is happy with the packet-per-second (PPS) output from the GPS (the "GPS" source above), though not happy with the NMEA GPS time/date information. It appears to be accepting data from peers even if the data is "not combined."

----------
# ./get_gps_status
    {
    "active": 1,
    "time": "2020-05-26T16:17:27.680Z",
    "sats": 11,
    "sats_used": 8,
    "tpv_status": null,
    "tpv_mode": 3,
    "tpv_time": "2020-05-26T16:17:27.000Z",
    "lat": XXX.XXXXXX,
    "lon": YYY.YYYYYY,
    "alt": ZZZZ.Z,
    "ept": 0.005,
    "epx": 9.598,
    "epy": 16.407,
    "epv": 20.24
  }
----------

Things look very similar, only there are 8 of 11 satellites used instead of 11 of 12. Time error ept=0.005 whih is the same. And tpv_mode=3 just as before, indicating a full 3-D satellite position lock.


This is quite puzzling. I cannot see a reason chronyd should have ever began rejecting the time data it was receiving.


Thanks for any/all insights!


Aaron Gifford

--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/