[chrony-users] Chronyd has good sources but rejects time "may be in erro

[chrony-users] Chronyd has good sources but rejects time "may be in error" and "Can't synchronise: no majority"

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]

To: chrony-users@xxxxxxxxxxxxxxxxxxxx
Subject: [chrony-users] Chronyd has good sources but rejects time "may be in error" and "Can't synchronise: no majority"
From: "Aaron D. Gifford" <synctimeguy1@xxxxxxxxxxx>
Date: Tue, 26 May 2020 11:21:47 -0600

I have a pesky Raspberry Pi with GPS doing GPS+PPS for stratum 1 timekeeping using chronyd. It's set up identically to a second Pi doing thesame thing at a second location.


Here's a tidbit from my problem Pi's chrony.conf:
---------------------------
## GPS NMEA reference:
refclock SHM 0 offset 0.470 refid NMEA

## GPS PPS reference:
refclock SHM 1 refid GPS lock NMEA

## 10.0.0.10 (Raspberry Pi GPS+PPS)
## NOTE: This device is IDENTICAL to problem Pi but doesn't
##       seem to exhibit the same problem as often.
peer 10.0.0.10
allow 10.0.0.10

## 10.50.0.10 (FreeBSD Garmin 18x GPS+PPS)
peer 10.50.0.10
allow 10.50.0.10
---------------------------

THE PROBLEM: Periodically chronyd seems to stop listening to the GPS'sdate/time and PPS information, and even stops listening to peer NTP servers.

When I learn of this problem, I login to the problem Pi and look atthings thus:

----------
# chronyc tracking && chronyc -n sources
Reference ID    : 00000000 ()
Stratum         : 0
Ref time (UTC)  : Thu Jan 01 00:00:00 1970
System time     : 0.000000019 seconds fast of NTP time
Last offset     : -0.000000070 seconds
RMS offset      : 0.000000508 seconds
Frequency       : 18.205 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 1.000000000 seconds
Root dispersion : 1.000000000 seconds
Update interval : 16.0 seconds
Leap status     : Not synchronised
210 Number of sources = 4

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.

/ .- Source state '*' = current synced, '+' = combined , '-' = notcombined,| / '?' = unreachable, 'x' = time may be in error, '~' = time toovariable.

||                                                 .- xxxx [ yyyy ] +/- zzzz

|| Reachability register (octal) -. | xxxx = adjustedoffset,|| Log2(Polling interval) --. | | yyyy = measuredoffset,|| \ | | zzzz = estimatederror.

||                                 |    |           \

MS Name/IP address Stratum Poll Reach LastRx Last sample

===============================================================================

#x NMEA 0 4 377 16 +76ms[ +76ms] +/-6673us#x GPS 0 4 377 17 -23us[ -23us] +/-444ns=x 10.0.0.10 1 9 377 638 -33us[ -33us] +/-1016us

=x 10.50.0.10                1  10   377  1107   +407us[ +407us] +/- 1441us
----------

Notice that EVERY time source is showing 'x' (time may be in error).What? I login to 10.0.0.10, the identical Pi at a second location, andit is fine--and shows source 10.50.0.10 as a good time source too.

So I wonder, is the GPS not locked? I use a script to query gpsd (viaTCP to 127.0.0.1 port 2947 where gpsd is listening) to find out thestatus of the GPS device:

----------
    {
    "active": 1,
    "time": "2020-05-26T15:54:12.537Z",
    "sats": 12,
    "sats_used": 11,
    "tpv_status": null,
    "tpv_mode": 3,
    "tpv_time": "2020-05-26T15:54:11.000Z",
    "lat": XXX.XXXXXX,
    "lon": YYY.YYYYYY,
    "alt": ZZZZ.Z,
    "ept": 0.005,
    "epx": 8.227,
    "epy": 7.806,
    "epv": 27.83
  }
----------

Looks like the GPS is active, has time and date, has a full 3-D lock(tpv_mode=3), can see and use information from 11 satellites. And theGPS estimates time error at 5 ms. (ept=0.005). So far as I can tell,

the GPS is functioning as expected, and should be delivering time and
a PPS signal to chronyd just fine.

So what's going on with chronyd? Why is it rejecting all time sources?What can I enable to find out more?

I've quickly glanced at various logs(measurements/tracking/refclocks/statistics) and don't immediatelynotice any numbers outside the usual range seen before and after theproblem manifested.


The only indicator was one log entry in syslog:
----------
May 26 15:31:16 myhost chronyd[435]: Can't synchronise: no majority
----------

I gave up and simply rebooted the system a bit over 25 minutes later,and chronyd resumed with these syslog entries:

----------
May 26 15:57:00 myhost systemd[1]: Starting chrony, an NTP client/server...

May 26 15:57:00 myhost chronyd[426]: chronyd version 3.4 starting(+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS+SECHASH +IPV6 -DEBUG)May 26 15:57:00 myhost chronyd[426]: Could not open IPv6 command socket: Address family not supported by protocolMay 26 15:57:00 myhost chronyd[426]: Frequency 18.205 +/- 0.021 ppm readfrom /var/lib/chrony/chrony.drift

May 26 15:57:00 myhost systemd[1]: Started chrony, an NTP client/server.
May 26 15:57:47 myhost chronyd[426]: Selected source GPS

May 26 15:57:47 myhost chronyd[426]: System clock wrong by 13.062845seconds, adjustment startedMay 26 15:58:00 myhost chronyd[426]: System clock was stepped by13.062845 seconds

May 26 15:58:48 myhost chronyd[426]: Can't synchronise: no majority
May 26 16:01:11 myhost chronyd[426]: Selected source GPS
----------

And after the reboot and once chronyd was synchronized--roughly 45minutes after it originally lost synch--I took another look at things tosee if I could notice any odd differences in comparison to what I sawwhile chronyd on the Pi was exhibiting the problem:


----------
# chronyc tracking && chronyc -n sources -v
Reference ID    : 47505300 (GPS)
Stratum         : 1
Ref time (UTC)  : Tue May 26 16:16:17 2020
System time     : 0.000000045 seconds slow of NTP time
Last offset     : +0.000000082 seconds
RMS offset      : 0.000000578 seconds
Frequency       : 18.105 ppm fast
Residual freq   : -0.004 ppm
Skew            : 0.060 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000022815 seconds
Update interval : 16.0 seconds
Leap status     : Normal
210 Number of sources = 4

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.

/ .- Source state '*' = current synced, '+' = combined , '-' = notcombined,| / '?' = unreachable, 'x' = time may be in error, '~' = time toovariable.

||                                                 .- xxxx [ yyyy ] +/- zzzz

|| Reachability register (octal) -. | xxxx = adjustedoffset,|| Log2(Polling interval) --. | | yyyy = measuredoffset,|| \ | | zzzz = estimatederror.

||                                 |    |           \

MS Name/IP address Stratum Poll Reach LastRx Last sample

===============================================================================

#x NMEA 0 4 377 20 +65ms[ +65ms] +/-6057us#* GPS 0 4 377 20 +278ns[ +440ns] +/-444ns=- 10.0.0.10 1 6 377 41 -19us[ -18us] +/-1014us

=- 10.50.0.10                1   6   377    35   -165us[ -165us] +/- 1327us
----------

It looks like chronyd is happy with the packet-per-second (PPS) outputfrom the GPS (the "GPS" source above), though not happy with the NMEAGPS time/date information. It appears to be accepting data from peerseven if the data is "not combined."


----------
# ./get_gps_status
    {
    "active": 1,
    "time": "2020-05-26T16:17:27.680Z",
    "sats": 11,
    "sats_used": 8,
    "tpv_status": null,
    "tpv_mode": 3,
    "tpv_time": "2020-05-26T16:17:27.000Z",
    "lat": XXX.XXXXXX,
    "lon": YYY.YYYYYY,
    "alt": ZZZZ.Z,
    "ept": 0.005,
    "epx": 9.598,
    "epy": 16.407,
    "epv": 20.24
  }
----------

Things look very similar, only there are 8 of 11 satellites used insteadof 11 of 12. Time error ept=0.005 whih is the same. And tpv_mode=3just as before, indicating a full 3-D satellite position lock.

This is quite puzzling. I cannot see a reason chronyd should have everbegan rejecting the time data it was receiving.



Thanks for any/all insights!


Aaron Gifford

--

To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxxwith "unsubscribe" in the subject.For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxxwith "help" in the subject.

Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.

Follow-Ups:
- Re: [chrony-users] Chronyd has good sources but rejects time "may be in error" and "Can't synchronise: no majority"
  - From: Miroslav Lichvar

Messages sorted by: [ date | thread ]
Prev by Date: Re: [chrony-users] help with bindacqaddress
Next by Date: [chrony-users] Checking chrony daemon status via NTP3 protocol
Previous by thread: Re: [chrony-users] help with bindacqaddress
Next by thread: Re: [chrony-users] Chronyd has good sources but rejects time "may be in error" and "Can't synchronise: no majority"

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/