On Tue, Aug 21, 2012 at 06:05:46PM +0100, Tomalak Geret'kal wrote:
I don't experience this issue at all when "noselect" is used on the
NMEA/"GPS" source. That is, when I can launch chronyd past my
adjtimex()/shmget() issues, the PPS has so far lasted up to 16 hours
(longer tests pending) - far longer than it managed without the
"noselect".
Perhaps the PPS is simply not polled any more in such a case?
Were the refclock and tracking logging enabled when that happened?
I'm not really worried about this case any more - "noselect" on the
GPS source is doing its job as far as I can tell and my PPS/GPS
offsets remain sane. Again, longer tests pending.
It's really just the adjtimex()/shmget() oddity I'm confused about
now. It really does seem to occur largely randomly and then vanish
when I replace the binary with a new build which differs only by
more verbose syslog output; to me, this screams UB in my build, but
yikes. My investigation continues...!
That sounds like a race condition. Does it work under strace?
Does flushing the kernel cache trigger it again?
echo 3 > /proc/sys/vm/drop_caches