RE: [chrony-users] kernel PPS troubleshooting

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On  Tuesday, December 03, 2013 9:18 AM Miroslav Lichvar wrote: 
> On Mon, Dec 02, 2013 at 02:59:05PM -0500, Battocchi, Scott L. wrote:

>> Since we will not have access to a network time source and will be relying on GPSD/NMEA to get us in the correct ballpark on system startup, is there another configuration option we can try to minimize the snapping back to GPS so quickly?
> You can mark the NMEA source as noselect and still lock the PPS source to it. The PPS samples will be ignored when NMEA is off by more than
> 0.2 seconds, but according to your graphs that shouldn't happen very often.

I will try that in the next couple of days

>> The three attached plots are:
>> 4hr_offsets:  Hours 0-4, offsets straight from statistics.log
>> 4hr_offsets_PPSadjusted:  Hours 0-4, adjusted offsets assuming PPS was always 0 and using the most recent PPS value to adjust the actual offset in statistics.log
>> Syncsource_PPSadjusted:   Hours 2-4, same data as PPSadjusted but with background highlighted according to active sync source from tracking.log
> Nice graphs!

Thanks, figuring out the best way to convey all of the associated log messages has consumed more brainpower than I'd like to admit...

>> I have the full console output as well with debugging enabled and am trying to figure out how best to parse and analyze it.  One thing I notices in comparison to my previous run is that all of the ignored PPS samples are coming from line 465 in refclock.c:
>> refclock.c:465:(RCL_AddPulse)[28-14:20:00] refclock pulse ignored 
>> second=0.999999657 sync=0 dist=1.500000000
> Hm, that's weird. Do they all have the same sync and dist value? Could you please attach corresponding parts of the tracking, refclock and statistics logs around the time when the PPS source is dropped?

So a quick grep through the trace log shows that there were
99671 pulses ignored with sync=0, all of which had dist=1.5 (no other dist. was reported with sync=0)
96 pulses ignored with sync=1, which happened in 6 groupings each starting with dis=7.x or 8.x and ending after 16 cycles (one second per cycle) ending at 22.x or 23.x
All 6 of these groupings came in the first 2.5 minutes after starting chrony.

I've attached the tracking, measurements, refclocks, and sources logs trimmed to start at the 2.35 hour mark (to coincide with the graph colored by sync source in my previous mail).  I also moved the rolling header line for each log to the start of these trimmed ones and removed any subsequent headers from the remainder of the file.  They each run about 16 minutes and through multiple sync source selections.  I did not include any logs from the first two  minutes where sync=1 and dist actually changed since that seemed to be a startup artifact and not related to the rest of the long run issues.
I also have a trimmed console output at 810kB that I can send along if interested.

>> and not line 440 like they were on the previous run:
>> refclock.c:440:(RCL_AddPulse)[26-18:03:56] refclock pulse ignored 
>> offdiff=-0.313099609 refdisp=0.041061551 disp=0.022734546
> They are ignored in a different place because the lock option wasn't used this time.

Ahh, that's makes sense.

Thanks again for all the help,
Scott Battocchi

Attachment: tracking_2.35hrs_in.log
Description: tracking_2.35hrs_in.log

Attachment: measurements_2.35hrs_in.log
Description: measurements_2.35hrs_in.log

Attachment: refclocks_2.35hrs_in.log
Description: refclocks_2.35hrs_in.log

Attachment: statistics_2.35hrs_in.log
Description: statistics_2.35hrs_in.log



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/