Re: [chrony-users] chrony configuration help (please)

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Maybe there is something which puts the system to sleep if it is inactive to a
long time. since a million seconds is certainly a long time, that thing could
decide you have been inactive and puts everything to sleep. Then when things
start to happen again, it wakes up

William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  www.theory.physics.ubc.ca/

On Wed, 5 Aug 2015, Mauro Condarelli wrote:

Update inline below.

Il 04/08/2015 20:55, Mauro Condarelli ha scritto:
Hi,
Thanks for the answer.
Comments inline below.


Il 04/08/2015 17:50, Bill Unruh ha scritto:

On Tue, 4 Aug 2015, Miroslav Lichvar wrote:

On Mon, Aug 03, 2015 at 10:56:51PM +0200, Mauro Condarelli wrote:
....


2. In the above event, after several minutes, chrony announces it is going to step by several million seconds (as expected); shortly thereafter system dies... somewhat; i.e.: serial console and ssh are completely unresponsive, but "ping" gets an answer. Nothing is logged.

Hm, that's odd. Can you reproduce the problem by stepping the clock
manually with the date command, e.g. date -s '+ 1000000 sec' ?


In this context might there be an option for chrony to take the intial time when starting up from the date and time on some file. eg --startfile <nameoffile> If that directive is there then chrony would use the mtime (plus say 60 sec) from that file as the
startup file if the rtc time does not exist, or if the rtc time is earlier
than that mtime. That would solve the problem of an insanely early rtc time or no rtc at all (rPi for example) and give a time which is at least not totally
crazy.  The problem with iburst is
that the clock is initially set already to a potentially insane time so the
system for a while has a problematic time.
Agreed.
Can You give me specific instructions?
As said this is the first time I attempt to use chrony at all.

Re the problem itself, ping is pretty deep down in the kernel, so much of the kernel could be dead, and user programs dead and ping will still respond. But the kernel should not die just because the clock is advanced by a few decades. Sounds like a kernel bug. What OS/Distribution is this again
I do not have a "distribution".
This is an embedded system based on ACME AriaG25.
I recompiled everything from scratch (including arm-gcc) using the Biuldroot framework.

/ # uname -a
Linux ok-cash 3.16.1 #1 Thu Jul 16 14:02:29 CEST 2015 armv5tejl GNU/Linux

System seems to be generically be well behaving, but I *cannot* exclude neither software build nor straight hardware bugs, unfortunately.

Brutally cutting power (mains *and* battery backup, so no graceful chrony termination *and* RTC reset) i get into the error condition:

...
Jan 1 01:00:09 ok-cash user.info kernel: [ 3.093750] EXT4-fs (mmcblk0p6): mounted filesystem with ordered data mode. Opts: (null) Jan 1 01:00:09 ok-cash daemon.info kernel: [ 3.492187] udevd[476]: starting version 3.0 Jan 1 01:00:09 ok-cash user.notice kernel: [ 3.515625] random: udevd urandom read with 85 bits of entropy available Jan 1 01:00:11 ok-cash user.notice kernel: [ 4.968750] random: nonblocking pool is initialized Jan 1 01:00:11 ok-cash daemon.info chronyd[494]: chronyd version 1.31 starting Jan 1 01:00:11 ok-cash daemon.err chronyd[494]: Could not open IPv6 NTP socket : Address family not supported by protocol Jan 1 01:00:11 ok-cash daemon.err chronyd[494]: Could not open IPv6 command socket : Address family not supported by protocol Jan 1 01:00:11 ok-cash daemon.info chronyd[494]: Set system time, error in RTC = 31906737.155476 Dec 27 18:01:14 ok-cash daemon.info chronyd[494]: Frequency -1.498 +/- 0.147 ppm read from /var/lib/chrony/drift Dec 27 18:01:24 ok-cash daemon.info chronyd[494]: System trim from RTC = 0.691058 Dec 27 18:01:24 ok-cash daemon.info init: starting pid 506, tty '': '/bin/ash ' Dec 27 18:01:26 ok-cash user.info kernel: [ 17.867187] macb f802c000.ethernet eth0: link up (100/Full) Dec 27 18:01:29 ok-cash daemon.info avahi-daemon[555]: Found user 'avahi' (UID 1003) and group 'avahi' (GID 1000).
...
Dec 27 18:03:02 ok-cash auth.info sshd[605]: Accepted password for mcon from 192.168.7.114 port 16760 ssh2 Dec 27 18:03:05 ok-cash authpriv.notice sudo: mcon : TTY=pts/1 ; PWD=/home/mcon ; USER=root ; COMMAND=/bin/su -
Dec 27 18:03:05 ok-cash auth.notice su: + /dev/pts/1 mcon:root
...
/ # date
Tue Dec 27 18:01:58 CET 2005
...
Dec 27 18:05:06 ok-cash daemon.info chronyd[494]: Selected source 193.204.114.233 Aug 4 20:47:36 ok-cash daemon.warn chronyd[494]: System clock was stepped by 303010949.791602 seconds

Here the system is virtually dead.
NOT really true.
Everything froze for a while (so ssh connections were dropped), but system recovered on its own after several minutes. I actually discovered this because, before switching off my development machine, tried a ssh reconnect; it worked.
Here is an excerpt of /var/log/messages:

...
Dec 27 18:03:02 ok-cash auth.info sshd[605]: Accepted password for mcon from 192.168.7.114 port 16760 ssh2 Dec 27 18:03:05 ok-cash authpriv.notice sudo: mcon : TTY=pts/1 ; PWD=/home/mcon ; USER=root ; COMMAND=/bin/su -
Dec 27 18:03:05 ok-cash auth.notice su: + /dev/pts/1 mcon:root
Dec 27 18:05:06 ok-cash daemon.info chronyd[494]: Selected source 193.204.114.233 Aug 4 20:47:36 ok-cash daemon.warn chronyd[494]: System clock was stepped by 303010949.791602 seconds

<-----  here is where ssh & serial console froze ----->

Aug 4 20:48:10 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -271104240.791 seconds
Aug  4 20:54:28 ok-cash daemon.warn chronyd[494]: Forward time jump detected!
Aug 4 20:54:28 ok-cash daemon.info chronyd[494]: Can't synchronise: no reachable sources Aug 4 20:56:41 ok-cash daemon.info chronyd[494]: Selected source 193.204.114.233 Aug 4 20:56:49 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -55.032 seconds Aug 4 20:57:31 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.492 seconds Aug 4 20:58:13 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.957 seconds Aug 4 20:58:56 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -5.418 seconds Aug 4 20:59:38 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.880 seconds Aug 4 21:00:21 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -5.343 seconds Aug 4 21:01:03 ok-cash daemon.info chronyd[494]: Trimming RTC, error = -4.816 seconds

I will do some tests tomorrow to see if and after how much time the serial console will come back to life. Anyways a hiatus of almost one minute (at least) does not seem very healthy, even if system recovers.
Does anyone have an idea about that could be happening?
I can post the whole startup sequence, if deemed useful.

Pretty please HEEEELP!!!

TiA
Mauro



--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/