Re: [chrony-users] Fatal error : adjtimex() failed

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On 21/08/2012 16:31, Bill Unruh wrote:
On Mon, 20 Aug 2012, Tomalak Geret'kal wrote:

On 20/08/2012 22:44, Bill Unruh wrote:
Hmm. How are you feeding the shm? The PPS source cannot give you the
 seconds.
It is only accurate to the nsec, but completely oblivious to seconds, so
 you
have to do something to feed it the seconds. That could be the gps itself,
 or
 some other source.

The SHM is fed by a known-good process that works with ntpd and also here

Is it a secret which program you use?
No, it's not a secret, but it's in-house so you won't have heard of it. The code is pretty much extracted straight from gpsd, though - there's nothing unusual in it. I can show source if required, though I'd rather not...

with chrony when I can get it to start up. As you can see from the syslog, the SHM source was selected successfully.



> >  [sw200319 /root]# chronyc sources
>  210 Number of sources = 2
> MS Name/IP address Stratum Poll LastRx Last sample > ============================================================================ > > #? PPS0 0 4 43m -1607ms[ +400ms] +/- > 155ms > #* GPS 0 4 16 -14ms[ -14ms] +/- > 60ms

That indicates that the PPS is almost 2 seconds out from the gps. a few
 10s or
even 100s of ms I could understand, but this indicates that the pps source
 is
 getting the wrong seconds information.

 Also a fluctuation of 400ms or even 155 ms is pretty huge.
But as you point out yourself, PPS is oblivious to time-of-day as it provides only *timing*. My understanding is that this value in "chronyc sources" is actually just an artefact of the PPS not having been used to discipline usage of the SHM source for a full 43 minutes, so it's showing the result of jitter in the NMEA input?

All sources MUST have a seconds source as well. Ie, PPS needs to be fed the seconds by some other source. For you it was the GPS source I believe. That is why that 1.6 second offset is so weird. Also that line says that the last time it got a PPS signal was 43 minutes ago. It should be say 15 sec ago instead. Your PPS source is not working at all.
My PPS is a known-good 50%-on-50%-off source.

I don't experience this issue at all when "noselect" is used on the NMEA/"GPS" source. That is, when I can launch chronyd past my adjtimex()/shmget() issues, the PPS has so far lasted up to 16 hours (longer tests pending) - far longer than it managed without the "noselect".
Perhaps the PPS is simply not polled any more in such a case?

I'm not really worried about this case any more - "noselect" on the GPS source is doing its job as far as I can tell and my PPS/GPS offsets remain sane. Again, longer tests pending.

It's really just the adjtimex()/shmget() oddity I'm confused about now. It really does seem to occur largely randomly and then vanish when I replace the binary with a new build which differs only by more verbose syslog output; to me, this screams UB in my build, but yikes. My investigation continues...!

This is honestly still a million times better than working with ntpd. Kudos.

Tom

--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/