Re: [chrony-users] Query regarding "chronyd failing spuriously"

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Thanks Miroslav and Bill for the replies.
That really were useful insights into the design :)

However, I would like to scrape the surface a bit more :P
I will be grateful, if you guys could let me know what the "expected results" are in the following use-case (let's forget about any bugs et. al.)


0)
We just consider one single machine in our setup.

1)
I power on the machine, and internet is available.

2)
I start the "chronyd" service.
Time gets synced up on the machine (by offsetting), and the "slew rate" gets stored in the "chronyd database".

3)
I then manually disrupt the time on the machine.
Let say, I offset the time by 3 years.



Now, following are the options.
What are the "expected behaviours" in the following ?

4a)
I "down" the internet, the "chronyd" service is never restarted, and the internet never comes up again..

4b)
I "down" the internet, the "chronyd" service is never restarted, and the internet comes up at some random time in the future.

4c)
I "down" the internet, then restart "chronyd", but internet never comes up again.

4d)
I "down" the internet, then restart "chronyd", and internet comes up again at some random time in the future.

4e)
The internet is alive, but the "chronyd" service is never restarted.

4f)
The internet is alive, and I restart the "chronyd" service.


Having a knowledge of the theoritical-expected-results in the above cases, will help me a great deal narrowing down my debuggings, as per the procedures suggested by you guys.


Thanks again, for the quick replies, and persevering with me :)



On Thu, Aug 15, 2013 at 11:23 PM, Bill Unruh <unruh@xxxxxxxxxxxxxx> wrote:
On Thu, 15 Aug 2013, Ajay Garg wrote:

Thanks Bill for the speedy reply.


1)
I am not using a virtual machine.
All tests that I have been doing are on individual, dedicated machines.

2)
The machine where the "spurious" instances occur, is a Fedora-18 on ARM,
using chrony-1.28.


One question ::
==========

What is the time-interval, after which the time is supposed to be synced
"automatically" ?
(note that, I have not made any changes in "/etc/chrony.conf" whatsoever).

There is no such time. chrony measures the time from the server, and does a
least squares fit to figure out what the rate difference and the time
difference is between the two machines. It then adjusts the local rate to get
rid of that time difference ( eg runs the local machine fast if it is slow wrt
the server) and it depends on how far off the local machine is as to how long
that will take. It eventually brings the rate of the local machine so it
matches the remote rate and the time so it matches the remote time.

In chrony.conf of the bad machine could you pls switch on the "measurement"
(eg make sure that the lines
logdir /var/log/chrony
log statistics measurements tracking

are in chrony.conf ( and that a server 192.168.0.2 is in there as well whee that is the internal IP of S1)
You can also put in initstepslew 10. 192.168.0.2
which tells chrony to only correct the intial offset by slewing rather than
stepping if the offset is less than 10 seconds. If it is more, chrony will
step the local clock so its time is the same as that server.

This is so that if your system has a time that is way off on bootup, or when
chrony starts then it will step the local time to get rid of that offset and
not try to slew the time, which could take years if the offset is large enough
(eg on an RPi, the intial time is Jan 1 1970 and it will take a minimum of 400
years to get rid of that offset by slewing. Note that unlike ntpd chrony can slew the clock much faster than 500PPM, up to
100000PPM, but that still only gets rid of one second offset in 10 sec.

Anyway, then post the contents of /var/log/chrony/measurements.log  here so we
can see what is happening. Also post your full chrony.conf





Based upon your answer, I "could probably" be in a state to do more
rigorous testing.


Thanks again for all the help !



On Thu, Aug 15, 2013 at 9:36 PM, Bill Unruh <unruh@xxxxxxxxxxxxxx> wrote:

On Thu, 15 Aug 2013, Ajay Garg wrote:

 Hi all.

I have been able to successfully install, configure and run the "chronyd"
service :)


However, I spuriously note the following things ::

a)
Even though "chronyd" is running in the background (confirmed by "service
chronyd status"), the time does not sync up.
This generally happens when the "time drift" is very large.


I assume this is one one of your clients (C1....Cn) Is it a virtual
machine?
Ie, is it linux running on top of another operating system? That almost
certainly will not work, since the time slicing given to the OS vary wildly
and as a rsult the system time varies wildly and there is not consistant
drift. You should use the underlying OS to set its clock and the virtual
machine should read its time from there. But Many virtual OSs do not seem
to
do that. They try to read their own system times, which are junk.

If it is not a virtual machine, then there is something seriously wrong
with
the hardware.




b)
In other cases, (when the "time drift" is large), the "chronyd" service
dies at some point, randomly (confirmed by "service chronyd status").


Which version of chrony are you using by the way?






To cater to above, (and ensure that the time does not remain out of sync
for more than 59 minutes), I have setup a hourly cron job, that does
"service chronyd restart".
Doing a service restart, syncs the time, no matter what.



My query is ::
=========

Installing the hourly cron-job seems to be a hack, to make the
time-syncing
work seamlessly.
Is there a better way, to ensure that spurious cases a) and b) never
happen?


a) or b) should never happen. Note that I vaguely recalol that there was a
bug
in an older version of chrony which could cause it to run away and crash.
Make
sure you are using 1.28 or 1.29





Will be grateful for any replies, that help make the procedure smoother
(and not rely on a hack) :)




Thanks and Regards,
Ajay


--
William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  www.theory.physics.ubc.ca/

--
To unsubscribe email chrony-users-request@chrony.**tuxfamily.org<chrony-users-request@xxxxxxxxxxxxxxxxxxxx>with "unsubscribe" in the subject.
For help email chrony-users-request@chrony.**tuxfamily.org<chrony-users-request@xxxxxxxxxxxxxxxxxxxx>with "help" in the subject.
Trouble?  Email listmaster@chrony.tuxfamily.**org<listmaster@chrony.tuxfamily.org>
..






--
William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  www.theory.physics.ubc.ca/

--
To unsubscribe email chrony-users-request@chrony.tuxfamily.org with "unsubscribe" in the subject.
For help email chrony-users-request@chrony.tuxfamily.org with "help" in the subject.
Trouble?  Email listmaster@chrony.tuxfamily.org.




--
Regards,
Ajay


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/