On Tue, 21 Aug 2012, Tomalak Geret'kal wrote:

On 21/08/2012 16:31, Bill Unruh wrote:
 On Mon, 20 Aug 2012, Tomalak Geret'kal wrote:

>  On 20/08/2012 22:44, Bill Unruh wrote:
> >  Hmm. How are you feeding the shm? The PPS source cannot give you the
> >   seconds.
> > It is only accurate to the nsec, but completely > > oblivious to seconds, so
> >   you
> > have to do something to feed it the seconds. That could > > be the gps itself,
> >   or
> >   some other source.
> > The SHM is fed by a known-good process that works with ntpd and also > here

 Is it a secret which program you use?
No, it's not a secret, but it's in-house so you won't have heard of it.
The code is pretty much extracted straight from gpsd, though - there's nothing unusual in it. I can show source if required, though I'd rather not...

OK. Are you sure it is actually treating the shm properly?

> with chrony when I can get it to start up. As you can see from the > syslog, the SHM source was selected successfully. > > > > > > > > > [sw200319 /root]# chronyc sources
> > >   210 Number of sources = 2
> > > MS Name/IP address Stratum Poll LastRx Last > > sample > > > > > ============================================================================ > > > > #? PPS0 0 4 43m > > -1607ms[ +400ms] +/- > 155ms > > > #* GPS 0 4 16 > > -14ms[ -14ms] +/- > 60ms > > > > That indicates that the PPS is almost 2 seconds out from the gps. a > > few
> >   10s or
> > even 100s of ms I could understand, but this indicates > > that the pps source
> >   is
> >   getting the wrong seconds information.
> > > > Also a fluctuation of 400ms or even 155 ms is pretty huge. > But as you point out yourself, PPS is oblivious to time-of-day as it > provides only *timing*. My understanding is that this value in "chronyc > sources" is actually just an artefact of the PPS not having been used to > discipline usage of the SHM source for a full 43 minutes, so it's > showing the result of jitter in the NMEA input?

 All sources MUST have a seconds source as well. Ie, PPS needs to be fed
 seconds by some other source. For you it was the GPS source I believe.
 That is
 why that 1.6 second offset is so weird. Also that line says that the last
 time it got a PPS signal was 43 minutes ago.
 It should be say 15 sec ago instead. Your PPS source is not working at
My PPS is a known-good 50%-on-50%-off source.

I don't experience this issue at all when "noselect" is used on the NMEA/"GPS" source. That is, when I can launch chronyd past my adjtimex()/shmget() issues, the PPS has so far lasted up to 16 hours (longer tests pending) - far longer than it managed without the "noselect".
Perhaps the PPS is simply not polled any more in such a case?

Are you getting better than 43 min between readings of  the PPS

I'm not really worried about this case any more - "noselect" on the GPS source is doing its job as far as I can tell and my PPS/GPS offsets remain sane. Again, longer tests pending.

It's really just the adjtimex()/shmget() oddity I'm confused about now. It really does seem to occur largely randomly and then vanish when I replace the binary with a new build which differs only by more verbose syslog output; to me, this screams UB in my build, but yikes. My investigation continues...!

This is honestly still a million times better than working with ntpd. Kudos.


