Re: [chrony-users] chrony configuration help (please)

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Update inline below.

Il 04/08/2015 20:55, Mauro Condarelli ha scritto:
Hi,
Thanks for the answer.
Comments inline below.


Il 04/08/2015 17:50, Bill Unruh ha scritto:

On Tue, 4 Aug 2015, Miroslav Lichvar wrote:

On Mon, Aug 03, 2015 at 10:56:51PM +0200, Mauro Condarelli wrote:
....


2. In the above event, after several minutes, chrony announces it is going to step by several million seconds (as expected); shortly thereafter system dies... somewhat; i.e.: serial console and ssh are completely unresponsive, but "ping" gets an answer. Nothing is logged.

Hm, that's odd. Can you reproduce the problem by stepping the clock
manually with the date command, e.g. date -s '+ 1000000 sec' ?


In this context might there be an option for chrony to take the intial time
when starting up from the date and time on some file. eg --startfile <nameoffile>
If that directive is there then chrony would use the  mtime (plus say 60 sec)  from that file as the
startup file if the rtc time does not exist, or if the rtc time is earlier
than that mtime. That would solve the problem of an insanely early rtc time or
no rtc at all (rPi for example) and give a time which is at least not totally
crazy.  The problem with iburst is
that the clock is initially set already to a potentially insane time so the
system for a while has a problematic time.
Agreed.
Can You give me specific instructions?
As said this is the first time I attempt to use chrony at all.

Re the problem itself, ping is pretty deep down in the kernel, so much of the
kernel could be dead, and user programs dead and ping will still respond. But
the kernel should not die just because the clock is advanced by a few decades. Sounds like a kernel bug. What OS/Distribution is this again
I do not have a "distribution".
This is an embedded system based on ACME AriaG25.
I recompiled everything from scratch (including arm-gcc) using the Biuldroot framework.

/ # uname -a
Linux ok-cash 3.16.1 #1 Thu Jul 16 14:02:29 CEST 2015 armv5tejl GNU/Linux

System seems to be generically be well behaving, but I *cannot* exclude neither software build nor straight hardware bugs, unfortunately.

Brutally cutting power (mains *and* battery backup, so no graceful chrony termination *and* RTC reset) i get into the error condition:

...
Jan  1 01:00:09 ok-cash user.info kernel: [    3.093750] EXT4-fs (mmcblk0p6): mounted filesystem with ordered data mode. Opts: (null)
Jan  1 01:00:09 ok-cash daemon.info kernel: [    3.492187] udevd[476]: starting version 3.0
Jan  1 01:00:09 ok-cash user.notice kernel: [    3.515625] random: udevd urandom read with 85 bits of entropy available
Jan  1 01:00:11 ok-cash user.notice kernel: [    4.968750] random: nonblocking pool is initialized
Jan  1 01:00:11 ok-cash daemon.info chronyd[494]: chronyd version 1.31 starting
Jan  1 01:00:11 ok-cash daemon.err chronyd[494]: Could not open IPv6 NTP socket : Address family not supported by protocol
Jan  1 01:00:11 ok-cash daemon.err chronyd[494]: Could not open IPv6 command socket : Address family not supported by protocol
Jan  1 01:00:11 ok-cash daemon.info chronyd[494]: Set system time, error in RTC = 31906737.155476
Dec 27 18:01:14 ok-cash daemon.info chronyd[494]: Frequency -1.498 +/- 0.147 ppm read from /var/lib/chrony/drift
Dec 27 18:01:24 ok-cash daemon.info chronyd[494]: System trim from RTC = 0.691058
Dec 27 18:01:24 ok-cash daemon.info init: starting pid 506, tty '': '/bin/ash '
Dec 27 18:01:26 ok-cash user.info kernel: [   17.867187] macb f802c000.ethernet eth0: link up (100/Full)
Dec 27 18:01:29 ok-cash daemon.info avahi-daemon[555]: Found user 'avahi' (UID 1003) and group 'avahi' (GID 1000).
...
Dec 27 18:03:02 ok-cash auth.info sshd[605]: Accepted password for mcon from 192.168.7.114 port 16760 ssh2
Dec 27 18:03:05 ok-cash authpriv.notice sudo:     mcon : TTY=pts/1 ; PWD=/home/mcon ; USER=root ; COMMAND=/bin/su -
Dec 27 18:03:05 ok-cash auth.notice su: + /dev/pts/1 mcon:root
...
/ # date
Tue Dec 27 18:01:58 CET 2005
...
Dec 27 18:05:06 ok-cash daemon.info chronyd[494]: Selected source 193.204.114.233
Aug  4 20:47:36 ok-cash daemon.warn chronyd[494]: System clock was stepped by 303010949.791602 seconds

Here the system is virtually dead.
NOT really true.
Everything froze for a while (so ssh connections were dropped), but system recovered on its own after several minutes.
I actually discovered this because, before switching off my development machine, tried a ssh reconnect; it worked.
Here is an excerpt of /var/log/messages:

....
Dec 27 18:03:02 ok-cash auth.info sshd[605]: Accepted password for mcon from 192.168.7.114 port 16760 ssh2
Dec 27 18:03:05 ok-cash authpriv.notice sudo:     mcon : TTY=pts/1 ; PWD=/home/mcon ; USER=root ; COMMAND=/bin/su -
Dec 27 18:03:05 ok-cash auth.notice su: + /dev/pts/1 mcon:root
Dec 27 18:05:06 ok-cash daemon.info chronyd[494]: Selected source 193.204.114.233
Aug  4 20:47:36 ok-cash daemon.warn chronyd[494]: System clock was stepped by 303010949.791602 seconds

<-----  here is where ssh & serial console froze ----->

Aug  4 20:48:10 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -271104240.791 seconds
Aug  4 20:54:28 ok-cash daemon.warn chronyd[494]: Forward time jump detected!
Aug  4 20:54:28 ok-cash daemon.info chronyd[494]: Can't synchronise: no reachable sources
Aug  4 20:56:41 ok-cash daemon.info chronyd[494]: Selected source 193.204.114.233
Aug  4 20:56:49 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -55.032 seconds
Aug  4 20:57:31 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.492 seconds
Aug  4 20:58:13 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.957 seconds
Aug  4 20:58:56 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -5.418 seconds
Aug  4 20:59:38 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.880 seconds
Aug  4 21:00:21 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -5.343 seconds
Aug  4 21:01:03 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.816 seconds

I will do some tests tomorrow to see if and after how much time the serial console will come back to life.
Anyways a hiatus of almost one minute (at least) does not seem very healthy, even if system recovers.
Does anyone have an idea about that could be happening?
I can post the whole startup sequence, if deemed useful.

Pretty please HEEEELP!!!

TiA
Mauro



--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/