Re: [chrony-dev] new feature request: add "fast" and "slow" to "clock wrong" and "clock stepped" log messages

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]



Bill, I really do not understand what point you are trying to present here, with respect to the system log message.  I was only suggesting that the log message should be more clear as to, let us call it, "leading" and "lagging", "forward" and "backward".

You are requesting something further. You are also requesting a change in the
value that is reported, not just making the sign of the error clearer. I am in
perfect agreement with the request for making the meaning of the sign clearer.
I am (perhaps) not in agreement with the value you want reported.


On 11/09/2017 09:05 PM, Bill Unruh wrote:
The three seconds is being corrected, meaning that the clock has been slowed
down so that say in 10 seconds it would be out by 0 seconds. It is a known
error and it is an error that, if everything is left as it is, will be 0 in 10
sec.


Precisely, so why would you report that?

Indeed.  I do not see any reason to report that.

But you are claiming you should report that 3 sec.

Uhm - 4seconds, using Miroslav's example.  I am saying that the log message should report that the system clock offset is currently 4seconds.  I do not understand where you interpret a "but" out of that.  Miroslav has said that, presently, chronyd will report instead, what I understand to be a kind of "differential slew error", which, in this example, would be "1second", not "4seconds".  And, I was suggesting that the log message should *not* report "1second".


There are two components of that 4 seconds. One is the three seconds which is
already being corrected by the alteration of the slew. The other is the 1
second which is a new offset. The three seconds is already being corrected,
and to report it now again may be confusing.



What is it that you are thinking that the log message should report?

From what Lichvar says, it seems that it now reports the 1 sec which is an
additional uncorrected error. The measurments log already reports the raw current offset and the corrected
offset.



Below, you say that chronyd, by slewing the clock, will correct an offset in "10*DT", "where DT is that measured offset".  We see that 10*4seconds = 40seconds, not the 10seconds in your example.

Think of this another way.  Observing the display time, adding or subtracting 1second to the display will not give the correct time when the clock is off by 4seconds.

But that is not what the log reports. Note that as I understand it, chrony
itself reports to other machines the corrected time -- the system time minus
the 3 seconds,( but I am not sure of this). Ie, there is the system time, and the
system time corrected for the ongoing slew,



"Slewing" is a different thing from "fix it immediately".  chronyd *will* step the clock when configured with "makestep", which is probably what you mean by "fix it immediately".

No, it will only make a step if the step size is larger than some value.

Of course.  I should not have to remind you that the "makestep" option takes a "threshold" argument?  What is your point?

chrony measures the offset of the clock by doing a least squared fit on the
past N measurements offsets, to determine what the best value of the current
offset is. Once it has that information it alters the rate of the clock so
that in something like DT*2000 seconds (where DT is that measured offset)  the
offset will be 0. (ntpd makes that 2000 a minimum. chrony will slew so that the
error disappears in 10*DT if DT is very large and the DT is less than the value
to make a step.)

That's useful to know.

Previous meaning "previously begun". Chrony, and ntpd do not correct times by
stepping the clock. They correct it by slightly increasing or decreasing the
rate by which the clock is running, until they have corrected the error.

There always exists an "instantaneous offset", of the system clock relative to the NTP server clock, which is an easy thing to describe, under any circumstances.

No, it is not because all the measurements have uncertainties-- due to travel
time fluctuations, due to measurement accuracy, due to interrupt handling
fluctruations, etc. Ie, no you cannot know what the "instantaneous offset" is.

You state the obvious.  What is your point?

I have no idea what is obvious to you.


If I wanted to be pedantic - which I often do - I might have said "the instantaneous offset estimate", instead of simply "the instantaneous offset".  But, we are all adults here, yes?

Yes, there is a variable frequency oscillator. Or rather the time in seconds
between the ticks of the cpu clock can be changed.

More of an electronics hardware guy would take issue with that blurring of the distinction between analog circuitry and digital circuitry. ;)

It is all digital. The cpu clock ticks ticks at a certain, unknown rate. The
system counts those ticks and converts them to time. That coversion process
must be calbrated and that calibration can be changed. It is primarily that
change in calibration that chrony or ntp controls--as I understand it.


Presumably, a measurement of the current instantaneous absolute offset of the system clock relative to the NTP server clock.  I believe that that is - or would be - the intuitive interpretation of most users.  The idea that this "measurement" reported in the the log message would represent something much more abstract - and obtuse - than that is - unfortunate.

No. In chrony it is the least squares linear fit of the last N measurements of
the offset that is used to estimate the current offset. Because the
instantaneously measured offset is beset by uncertainties that can be
"averaged out" by doing a least squares fit. N is chosen actively to make sure
that over that set of N measurements the offsets are well approximated by a
straight line with gaussian noise. Unfortunately in order to get a good estimate, one HAS to  use "abstract - and
obtuse" in your words, procedures. chrony's way of doing it is a factor of 3
to a 20 times better than ntpd's method according to measurements that both I
and Lichvar have made.

That's nice.  Still, you seem to misunderstand what I am referring to as "obtuse".  Let me try again.  Remember that the topic here is the log message, and nothing about the internal workings of chornyd - which we all admire - and nothing about the theory of the NTP - which is the subject of several standards.

And also, let me remind that the topic here is the *system log* message, and nothing to do with the "/var/log/chrony/*log" log messages, which you have referenced previously.  Are we on the same page now?

But ignoring those messages means that you are in danger of unnecessary
duplication of logs. Why should one care, as long as one knows what it is,
what the system log reports? IF you want to use the reports from chrony, and
if the chrony/*log messages give you what you need, why do you care what is in
the system log messages?



I repeat the explanation for chronyd's current version of the log message: "the initial offset was 5 seconds and the system clock was already corrected by 2 seconds when another measurement is made, which says the offset of the system clock is now actually 4 seconds instead of the expected 3, a message will be logged that the clock is wrong by 1 second."  *That message* is what I have called "obtuse".

There are two clocks-- the chrony clock and the system clock, which may not be
the same if there is a correction slew going on.

Perhaps you disagree with Miroslav about what is being reported?

Nope. I have not spent the time recently going through the source code to
figure out exactly what is being reported.


Now, this is information that I do *not* want to know.  And, I have not heard an argument that this is information that *anyone* wants to know.  So, I have suggested that this log message not be printed.  Instead, I have suggested that a simpler log message be printed, a message which describes the current instantaneous offset, of the system clock relative to the NTP server clock *estimate*.

Do you think also that that is a good idea?  Or, do you think that that is a bad idea?


It all depends on the truth of your contention that this is not "information
that *anyone* wants to know". Certainly if noone wants to know that
information, then reporting it is silly. Unfortunately I do not have the
expertise to know what whether or not your contention is true. I grant that
you do not want to know it.

So, The clock is out by 4 seconds. three of those seconds are being actively
fixed. One is an additional offset that needs now to be fixed. Is it better to
know about that additional offset or about the total? I can certainly see
arguments for either, especially as, if you really want to know, there is a
whole bunch of information in the chrony/*log files which you can look at if
you want more detailed information.

Bill, so far, you have failed to say what it is that you think *is* being reported in the system log, and have not said what you think chronyd *should* be reporting in the system log.

That would not hurt, but the issue I am raising is about the meaning of a simple log message, not about the theory of the network time protocol.  This does not need to be complicated.  I believe that it should not be *made* complicated.

But what is reported in the logs is precisely tied in with how chrony
estimates the uncertainties.

Ha!  Oh, I'm sure it's "tied in" there somewhere.  But, as I said, "obtuse", not interested.

Or maybe you are again talking about the "/var/log/chrony/*log" log messages?  That would be off topic.

Note that if you want the raw measurements of the offsets look in
/var/log/chrony/measurements, and /var/log/chrony/refclock logs. There you
will be given the raw offsets of each measurement of the local clock against
the estimate of the time given by the remote server. If that is what you want
to look at, look there. In the tracking log, and in chronyc tracking  and otherwise, the best estimate
of the current offset from the time of the servers  is reported instead.

As I have tried to explain, that is *not* what I want to see.  I only want to see something ever so slightly more informative than the term "wrong", as used now in the system log.

I will agree that "wrong" is insufficiently precise and is worth changing to
"ahead" or "behind" or something that tells in which direction it is wrong.
That is not where my argument lies. As to what I want, I really do not care.
In my logs the offset is of the order of microseconds or nanoseconds, and that
is far more accurate than I need. Once it gets up to seconds, something is
seriously wrong, and it needs fixing, and I can than look at the chrony/*log
files to get a much more detailed view of the situation.


I do not get the impression that you disagree.  But you do seem to be going off on tangents - informative, but not entirely relevant.


Perhaps because it is still not clear to me what it is that you regard as
relevant and certainly because I donot know what your level of knowledge is of
how chrony works.

That report in the syslogs is basically an emergency. The system clock is way
out and chrony has started correcting it. Once it has told you it has started
correcting it, why do you need more information. And if another huge offset
comes in that it needs to tell you about, it tells you what that new
additional correction is. You do not want to know every 4 seconds or every
hour that the clock is still out and that the correction is proceeding, do
you? But if something additional comes in that now needs additional work to
correct it, that is what you want to know surely.
So on my system, these are the three reports over the past two years.
Apr 14 14:45:17 info chronyd[2011]: System clock wrong by 1.013722 seconds,
adjustment started
Dec 15 15:31:14 info chronyd[15415]: System clock wrong by 1007.000208
seconds, adjustment started
Dec 15 15:30:55 info chronyd[15415]: System clock wrong by 1280.000250
seconds, adjustment started

Did I want to receive notifications all during the slew about how wrong the
offset was now? Not particularly. Had there been additional huge offsets, I
probably would want to know about them.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/