Re: [chrony-dev] PPS reference clock rejected because of high dispersion

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


On Fri, May 16, 2014 at 12:44:56PM +0200, Hattink, Tjalling [FINT] wrote:
> > Can you please enable the refclocks log so we can the individual SHM
> > and PPS samples?
> 
> I have been able to reproduce this by inserting outliers through the SHM
> interface. The samples are so delayed that the offset is multiple
> seconds. This is caused by our software stalling because of cpu power
> shortage. Here is a snippet of the refclocks log in that situation:

Ok. Good to know where is the large dispersion coming from.

> I've done further testing and investigations, and was able to cook up a
> patch that prevents this situation. In short, the patch will reject PPS
> pulses when the last sample from the locked ref clock is an outlier:

> -    offset += shift;
> +    if (ref_dispersion >= 0.5 / rate)
> +      return 0;
>  
> -    if (fabs(ref_offset - offset) + ref_dispersion + dispersion >= 0.2
> / rate)
> +    if (fabs(ref_offset - offset) >= 0.5 / rate)
>        return 0;

> The original alignment code is removed. Instead I check first if the
> dispersion of the ref clock is smaller than half the rate, otherwise you
> cannot reliable align the pps anymore to the refclock.

I think the alignment code is necessary to allow offsets larger than 1
second, if the code is removed the PPS offset could be off by a whole
number of seconds and chronyd will not be able to correct a large
initial offset on start. Also, I'm not sure if we want to allow
locking to a source if the two dispersion together are larger than
0.5, the offset could be again off by a number of seconds.

If you think the 0.2 second limit is too restrictive, we could
increase it a bit, but not by much to avoid the incorrect alignment.

> In the old situation the sample would be aligned using a shift, but that
> actually caused the PPS sample to become an outlier as well and it would
> increase dispersion of the PPS a lot. And in the old check where
> ref_dispersion and dispersion are used (refclock.c:421), the increased
> dispersion alone would cause all subsequent samples to be rejected. 

I think that's all right, we don't want to use the PPS sample unless
we can be sure the sources are so stable that the PPS second will be
aligned correctly.

If your SHM source is not very stable, you might want to remove the
lock and noselect options, increase the poll option for the SHM
source, let chronyd synchronize to SHM first and lock PPS to the
system clock instead of the SHM source. The lock option was intended
to be used only with stable sources.

> And
> as the filter is never updated the dispersion never became lower. So a
> deadlock, which matches my bug report.

That's the bug we need to fix, but I liked better your original
suggestion to reset the filter and the variance statistic when the
check fails (maybe not always, but after some number of times).

Thanks,

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/