RE: [chrony-users] interpolation of offsets in chrony - a robust approach

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


I think it might well be interesting. The problem is chrony's adaptive process
for deciding if a linear fit is a reasonable hypothesis, but looking at the
number of consecutive points on one side or the other. It is not at all clear
how one would impliment that in the Thiel-Sen procedure. I suppose one could
divide the points into two bunches and see if the slopes in the two were
consistant to within the estimated errors. But of course estimating the errors
is tough with not many points in the Thiel-Sen procedure. Note that the
procedure would also be expensive in storage-- you have to save n^2 slopes
rather than just n points. And you have to correct n^2 slopes when you change
the drift rate of the clocks instead of just the offsets of n points.




William G. Unruh __| Canadian Institute for|____ Tel: +1(604)822-3273
Physics&Astronomy _|___ Advanced Research _|____ Fax: +1(604)822-5324
UBC, Vancouver,BC _|_ Program in Cosmology |____ unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1 ____|____ and Gravity ______|_ www.theory.physics.ubc.ca/

On Wed, 17 Feb 2021, Charlie Laub wrote:

Well, if there is some interest in seeing how the Thiel-Sen estimator compares to how chrony processes the data now, I would be happy to try an do some coding. I think it would interesting do to a "real world" comparison, actually.


-----Original Message-----
From: Bill Unruh <unruh@xxxxxxxxxxxxxx>
Sent: Wednesday, February 17, 2021 1:08 PM
To: chrony-users@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [chrony-users] interpolation of offsets in chrony - a robust approach

iYes, the key difference between chrony and ntpd is that chrony does a linear regression on the last n samples to estimate the frequency and the offset now.
It figures out how many n to keep by looking at the number of consecutive samples which are above or below the regression line. If there are too many that suggests that the curve is not being well fitted by a linear regression, and the number of n used is decreased until the consecutive test is passed.
The number is then increased by one at time until the the test begins to be failed again. I belive the min samples and max samples tell what the maximum number of consecutive good sample are used, and the minimum number that are retained. The default at least used to 3 for minsamples (so a linear curve with at least some estimate of the errors can be fit) The max used to be 64 but these are now configurable if you know what you are doing.

These attributes of chrony allow it to make a much better estimate of the current offset and drift than does ntpd

Note that every time the rate of the clock is changed, all of the samples are also changed to reflect that change in rate. Or if the clock offset it jumped, all the retained samples are changed to reflect that jump. Otherwise the fitting would get all messed up.


If the noise is dominated by for example, poisson noise process, your estimator might be of advantage (given the cost that you state). but in ntp case it is a mixture of poisson and gaussian. In most situations the gaussian probably dominates. In some cases it does not, where a different analysis technique might be better. But I think you really would have to run simulation experiments both withsimulated data where some noise statistics is chosen, and with real data to see how much difference it makes. One also has to be worried about potential instabilities in the analysis one performs.
One way of handling outliers is to simply throw them away. Eg if a data point is 5sigma away from the best fit curve, one could simply eliminate it, and try again. This is what is done when the data is accumulated and only the median of the passed on to chrony. Typically only the 60 or 70% of the data points that lie closest together are used. This is to get rid of what David Mills called popcorn noise.

The problem is that there really are no great models for the noise, and besides, almost every implimentation is faced with different noise sources.
Also, the more complex one makes the analysis, the higher the probability that subtle (or not so subtle) bugs creep in, obviating all of the work.





Willia G. Unruh __| Canadian Institute for|____ Tel: +1(604)822-3273 Physics&Astronomy _|___ Advanced Research _|____ Fax: +1(604)822-5324 UBC, Vancouver,BC _|_ Program in Cosmology |____ unruh@xxxxxxxxxxxxxx Canada V6T 1Z1 ____|____ and Gravity ______|_ www.theory.physics.ubc.ca/

On Wed, 17 Feb 2021, Charlie Laub wrote:


While I was reading the docs I came across these parameters:

maxsamples [samples]

    The maxsamples directive sets the default maximum number of
samples that chronyd should keep for each source. This setting can be overridden for individual sources in the server and refclock directives. The default value is 0, which disables the configurable limit. The useful range is 4 to 64.

    As a special case, setting maxsamples to 1 disables frequency
tracking in order to make the sources immediately selectable with only one sample. This can be useful when chronyd is started with the -q or -Q option.



minsamples [samples]

    The minsamples directive sets the default minimum number of
samples that chronyd should keep for each source. This setting can be overridden for individual sources in the server and refclock directives. The default value is 6. The useful range is 4 to 64.

    Forcing chronyd to keep more samples than it would normally keep
reduces noise in the estimated frequency and offset, but slows down the response to changes in the frequency and offset of the clock. The offsets in the tracking and sourcestats reports (and the tracking.log and statistics.log files) may be smaller than the actual offsets.



Maybe I am way off here, but the descriptions suggest that these
retained samples are interpolated using a linear or other form, and then the interpolated info is used by chrony. Is that correct?



The offset data is obviously noisy. In addition I have observed on my
own machines that there can be occasional outliers that are on the order of 10x larger than usual. So the data also has outliers.



A linear regression is not the best way to process this kind of data. Instead a robust analysis method is best. There is a simple and effective one for obtaining the “best fit”
slope of a dataset called a Thiel-Sen estimator. There is a great
Wikipedia entry for it if you are not familiar with the technique (not
sure if links are allowed so I did not include it). In a nutshell, the
slope for all pairs of points in the dataset is computed and the median value is selected as the estimate of the slope. It is straightforward to use this to obtain an good estimate of the true offset for any time within the time interval of the dataset, and to make a prediction into the future. Because it can reject outliers and fits noisy data well, it seems like it would be a perfect candidate for a more robust offset estimator in chrony.



Normally this is termed an order N^2 difficult problem, because the
slope must be calculated for all pairs in the dataset. But to
implement this in chrony it seems to me you only need to compute N
pairs as each new offset is obtained. This is because the previous pairwise slope values will not change, and it is only the pairwise slope between the single new offset value and the existing, retained values that needs to be calculated. So the overhead would not be large, especially since the number of data points is less than e.g. 64.



Would it be worth looking into implementing this estimation method in chrony for predicting the current and future offsets?





-Charlie





--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/