Re: [chrony-dev] [GIT] chrony/chrony.git branch, master, updated. 1.25-pre1-18-g20a4340

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


On Thu, 14 Apr 2011, Miroslav Lichvar wrote:

On Thu, Apr 14, 2011 at 09:24:28AM -0700, Bill Unruh wrote:
On Thu, 14 Apr 2011, Miroslav Lichvar wrote:
weights2 are same for both sources, but weights1 are very different.

Agreed, but then I at least would have a lot less confidence in 2 rather than
1 because of the longer delay.

Yes, but that's handled in the source selection algorithm which works
with root_dispersions and skews.

So I presume that you agree with the division by sd, at least in
principle.

I was flying today, but spent some time thinking about it. The problem is that what curnoe uses for the estimated variance of the
measurements is that the variance of each item y[i] is proportional to w[i].
Unfortunately this is not true, and in particular if it is not, you get
exactly the instability that you see. That procedure badly underestimates the
variance of the y[i] and that increases all the w[i] except for the one that
is min_distance, which then gives an evcen smaller estimate for the variance.
Once the variance becomes small, the next one goes as the square of the
previous one, which very very very raplidly leads to an awful death.

I have been calculating the predicted sigma under the much more conservative
assumption that the variance of y[i] is the same for all i, but the w[i] are
used as your own personal faith in that particular value. This leads to a
procedure which is stable, but means that the calculation both of the variance
s0 and the variance of the slope and intercept become slightly more complex.

I will try to write this up and send it to you (have a colloquium to give
tomorrow, so it might take a day or two). But the fact that the current
precedure leads to and instability and in particular can badly underestimate
the value of the variances meand that I think for now it would be a good idea
to go back to the original and to look at this problem more deeply. I do agree
that using sigma is better, but the procedure at present simply does not work,
and I cannot see a simple fix. Anything simple seems to leave the system
unstable.



As I said, the problem is with the calculation of the stddev and the use of
weights. The calculation assumes that a weight Wi is treated as if it
 indicates that that item was measured Wi with the same value each time. The
more times you measure a random variable, the smaller the estimated standard
deviation becomes. But that is clearly wrong. If I give each item a weight of
1000, the variance drops by a factor of 1000 but just because I gave them
those weights does not mean that their variance has decreased. Ie, another way
around it would be to use normalised weights-- force the weights to sum up to
1.

I think only the relative differences in the weights matter, scaling
them won't have any effect on the resulting variance as it is
divided by mean of the weights at the end of the regress function.

Agreed.


This would make no difference to the calculated slope and intercept-- those
are homogeneous of degree 0 in the weights-- the weights naturally scale out,
but the variance is not. It scales inversely with the weights which is why you
are getting your runaway. For a small sd, the weights go up, which makes sd
still smaller, which shoots the weights up even more, etc. That is simply
wrong. If anything the sd should go up if you have large weights, but
certainly not down.

The latest commit fixes the runaway by using unweighted variance, i.e.
after the slope and intercept is determined, an extra variance is
calculated as if the weights were all 1.0. It should be always equal
or larger than the weighted variance.

In my tests it works well.

Also, the small sd gets reported and used in the hi/lo calculations and the
skew calculation, and may other things, and it is simply wrong to use that
small sd.

Skew and hi/lo are from the sd of the slope, which I believe wasn't
affected by this problem, only variance and sd of the intercept.

Since the variance of the slope is directly related to the variance of the
values, it is also affected.






---
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/