Re: [chrony-users] Chrony malfunction under ARM/Buildroot |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
Thanks Bill,
comments inline below.
Il 14/04/2016 21:08, Bill Unruh ha scritto:
On Thu, 14 Apr 2016, Mauro Condarelli wrote:
Hi,
I'm having major problems using Chrony in an embedded system sporting an oldish Atmel AT91SAM9G25.
I am using chrony-2.3 compiled with "--without-seccomp --without-tomcrypt --without-nss" under Linux 4.4.1.
/etc/chrony.conf is:
============================
#initstepslew 30 2.it.pool.ntp.org 0.pool.ntp.org 1.pool.ntp.org
server 0.it.pool.ntp.org
server 1.it.pool.ntp.org
server 2.pool.ntp.org
server 3.pool.ntp.org
logdir /var/log/chrony
log rtc statistics measurements tracking
logchange 1
driftfile /var/lib/chrony/drift
keyfile /etc/chrony.keys
generatecommandkey
makestep 1.0 3
maxupdateskew 100.0
dumponexit
dumpdir /var/lib/chrony
rtconutc
rtcautotrim 1
rtcfile /var/lib/chrony/rtc
bindcmdaddress /var/run/chrony/chronyd.sock
============================
I have two (possibly related) malfunctions:
1) if and when the RTC is severely off (complete loss of power, including battery backup) the system comes up with a specific date (Jan, 11th 2007) and after a while (i.e.: after chrony had the time to synchronize over the network) system seem to hang with very few systems actually working (notably the system answers to "ping") so that it seems completely dead. It turns out it is not really dead, but only in "suspended animation" as it always recovers completely (with correct date & time) after a LOOOOooonggggg time (>5min).
With no rtc ( which is a hardware problem not a chrony problem) there is no
way the system can know what the time is or how to make sure everything is
always monotonic. You will just have to make sure that the problem never
AFAIK chrony (I start it as: "chronyd -r -s") should look for the "driftfile" (present in my system) and set the time&date from it; this does not seem to happen.
occurs. I have never seen chrony "hang" the system. There is something else going on. Have you looked at all the logs-- dmesg, journalctl or var/log/messages, ...
to see what is going on? It may be that some program is noticing that chrony
is changing the time, and goes into a snit.
I find no message whatsoever in any of the logfiles.
When I see this "hang" it really seems everything freezes for a long while.
Without errors and without consequences (aside from the freeze itself, of course).
2) if the system comes up well (chrony seems to work as advertised) I sometimes get very bad system crashes (see below).
Neither of these conditions happen if I kill chrony early (before it has any chance to synchronize) or if it is inactive (e.g.: if system is disconnected from the internet).
As said this seems linked to chrony as neither condition happens if chrony is not running.
I will try substituting nntp/ntpdate in order to see if the problem is in touching time (i.e.: it is in the kernel itself).
What should I check?
Can someone help me debugging this?
NOTE: the need is to have a system with strictly monotonic time (i.e.: time can jump ahead, but can never jump back for any reason, even across reboots).
If necessary I can test with specific patches... or change completely.
Thanks in Advance
Mauro
============================
[ 232.117187] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:298]
OK, that has nothing to do with chrony. What are you running?
Nothing specific.
This happens when chrony gets the new internet time.
Actually at that time there is a lot of "normal" activity going on as this happens right after dhcpcd negotiated a new IP address;
I have the following /etc/dhcpcd.exit-hook (as advised by Miroslav Lichvar):
============================
#!/bin/ash
echo >>/dev/ttyS0 "------------ $reason ($interface)($if_up)($if_down) -----------"
case $reason in
NOCARRIER)
echo >>/dev/ttyS0 "--------- chrony offline ---------"
/usr/bin/chronyc offline
;;
BOUND|REBOOT)
echo >>/dev/ttyS0 "--------- chrony online ---------"
( sleep 1 \
&& /usr/bin/chronyc online \
&& /usr/bin/chronyc burst 4/4 \
&& sleep 10 \
&& (/usr/bin/chronyc tracking | grep -q 'System time.*slow' && chronyc makestep ) \
) &
;;
esac
============================
[ 232.117187] Modules linked in: cfg80211 usb_f_acm usb_f_serial libcomposite configfs u_serial atmel_usba_udc
[ 232.117187] CPU: 0 PID: 298 Comm: kworker/0:1 Tainted: G W 4.4.1 #1
[ 232.117187] Hardware name: Atmel AT91SAM9
[ 232.117187] Workqueue: events rtc_timer_do_work
So your rtc seems to be having trouble.
Agreed
[ 232.117187] task: c7a6f420 ti: c799c000 task.ti: c799c000
[ 232.117187] PC is at timerqueue_del+0x1c/0x84
[ 232.117187] LR is at rtc_timer_do_work+0xac/0x1a8
[ 232.117187] pc : [<c01c8e2c>] lr : [<c02c43fc>] psr: 00000013
[ 232.117187] sp : c799de00 ip : c799de20 fp : c799de1c
[ 232.117187] r10: 00000000 r9 : c7b7ea20 r8 : c7edee00
[ 232.117187] r7 : c7b7e9e0 r6 : c7b7e978 r5 : c7b7e9e0 r4 : c7b7ea20
[ 232.117187] r3 : 00000001 r2 : 04405600 r1 : c7b7ea20 r0 : c7b7e9e0
[ 232.117187] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 232.117187] Control: 0005317f Table: 273dc000 DAC: 00000053
[ 232.117187] CPU: 0 PID: 298 Comm: kworker/0:1 Tainted: G W 4.4.1 #1
[ 232.117187] Hardware name: Atmel AT91SAM9
[ 232.117187] Workqueue: events rtc_timer_do_work
[ 232.117187] [<c0010468>] (unwind_backtrace) from [<c000dad0>] (show_stack+0x20/0x24)
[ 232.117187] [<c000dad0>] (show_stack) from [<c01c10c0>] (dump_stack+0x20/0x28)
[ 232.117187] [<c01c10c0>] (dump_stack) from [<c000afa4>] (show_regs+0x1c/0x20)
[ 232.117187] [<c000afa4>] (show_regs) from [<c006a748>] (watchdog_timer_fn+0x150/0x1a4)
[ 232.117187] [<c006a748>] (watchdog_timer_fn) from [<c00515b8>] (__hrtimer_run_queues.constprop.5+0x140/0x21c)
[ 232.117187] [<c00515b8>] (__hrtimer_run_queues.constprop.5) from [<c005181c>] (hrtimer_interrupt+0xa0/0x1c0)
[ 232.117187] [<c005181c>] (hrtimer_interrupt) from [<c02e26b8>] (ch2_irq+0x30/0x38)
[ 232.117187] [<c02e26b8>] (ch2_irq) from [<c00469e8>] (handle_irq_event_percpu+0x7c/0x1dc)
[ 232.117187] [<c00469e8>] (handle_irq_event_percpu) from [<c0046b80>] (handle_irq_event+0x38/0x4c)
[ 232.117187] [<c0046b80>] (handle_irq_event) from [<c0049a5c>] (handle_fasteoi_irq+0x9c/0x110)
[ 232.117187] [<c0049a5c>] (handle_fasteoi_irq) from [<c0046188>] (generic_handle_irq+0x28/0x38)
[ 232.117187] [<c0046188>] (generic_handle_irq) from [<c0046440>] (__handle_domain_irq+0x8c/0xb0)
[ 232.117187] [<c0046440>] (__handle_domain_irq) from [<c000948c>] (aic_handle+0xb8/0xc0)
[ 232.117187] [<c000948c>] (aic_handle) from [<c000e524>] (__irq_svc+0x44/0x58)
[ 232.117187] Exception stack(0xc799ddb0 to 0xc799ddf8)
[ 232.117187] dda0: c7b7e9e0 c7b7ea20 04405600 00000001
[ 232.117187] ddc0: c7b7ea20 c7b7e9e0 c7b7e978 c7b7e9e0 c7edee00 c7b7ea20 00000000 c799de1c
[ 232.117187] dde0: c799de20 c799de00 c02c43fc c01c8e2c 00000013 ffffffff
[ 232.117187] [<c000e524>] (__irq_svc) from [<c01c8e2c>] (timerqueue_del+0x1c/0x84)
[ 232.117187] [<c01c8e2c>] (timerqueue_del) from [<c02c43fc>] (rtc_timer_do_work+0xac/0x1a8)
[ 232.117187] [<c02c43fc>] (rtc_timer_do_work) from [<c002c914>] (process_one_work+0x208/0x388)
[ 232.117187] [<c002c914>] (process_one_work) from [<c002d720>] (worker_thread+0x2d8/0x430)
[ 232.117187] [<c002d720>] (worker_thread) from [<c0031ce4>] (kthread+0xe0/0xf4)
[ 232.117187] [<c0031ce4>] (kthread) from [<c000a430>] (ret_from_fork+0x14/0x24)
[ 232.117187] Kernel panic - not syncing: softlockup: hung tasks
[ 232.117187] CPU: 0 PID: 298 Comm: kworker/0:1 Tainted: G W L 4.4.1 #1
[ 232.117187] Hardware name: Atmel AT91SAM9
[ 232.117187] Workqueue: events rtc_timer_do_work
[ 232.117187] [<c0010468>] (unwind_backtrace) from [<c000dad0>] (show_stack+0x20/0x24)
[ 232.117187] [<c000dad0>] (show_stack) from [<c01c10c0>] (dump_stack+0x20/0x28)
[ 232.117187] [<c01c10c0>] (dump_stack) from [<c008c2b0>] (panic+0x88/0x1f0)
[ 232.117187] [<c008c2b0>] (panic) from [<c006a774>] (watchdog_timer_fn+0x17c/0x1a4)
[ 232.117187] [<c006a774>] (watchdog_timer_fn) from [<c00515b8>] (__hrtimer_run_queues.constprop.5+0x140/0x21c)
[ 232.117187] [<c00515b8>] (__hrtimer_run_queues.constprop.5) from [<c005181c>] (hrtimer_interrupt+0xa0/0x1c0)
[ 232.117187] [<c005181c>] (hrtimer_interrupt) from [<c02e26b8>] (ch2_irq+0x30/0x38)
[ 232.117187] [<c02e26b8>] (ch2_irq) from [<c00469e8>] (handle_irq_event_percpu+0x7c/0x1dc)
[ 232.117187] [<c00469e8>] (handle_irq_event_percpu) from [<c0046b80>] (handle_irq_event+0x38/0x4c)
[ 232.117187] [<c0046b80>] (handle_irq_event) from [<c0049a5c>] (handle_fasteoi_irq+0x9c/0x110)
[ 232.117187] [<c0049a5c>] (handle_fasteoi_irq) from [<c0046188>] (generic_handle_irq+0x28/0x38)
[ 232.117187] [<c0046188>] (generic_handle_irq) from [<c0046440>] (__handle_domain_irq+0x8c/0xb0)
[ 232.117187] [<c0046440>] (__handle_domain_irq) from [<c000948c>] (aic_handle+0xb8/0xc0)
[ 232.117187] [<c000948c>] (aic_handle) from [<c000e524>] (__irq_svc+0x44/0x58)
[ 232.117187] Exception stack(0xc799ddb0 to 0xc799ddf8)
[ 232.117187] dda0: c7b7e9e0 c7b7ea20 04405600 00000001
[ 232.117187] ddc0: c7b7ea20 c7b7e9e0 c7b7e978 c7b7e9e0 c7edee00 c7b7ea20 00000000 c799de1c
[ 232.117187] dde0: c799de20 c799de00 c02c43fc c01c8e2c 00000013 ffffffff
[ 232.117187] [<c000e524>] (__irq_svc) from [<c01c8e2c>] (timerqueue_del+0x1c/0x84)
[ 232.117187] [<c01c8e2c>] (timerqueue_del) from [<c02c43fc>] (rtc_timer_do_work+0xac/0x1a8)
[ 232.117187] [<c02c43fc>] (rtc_timer_do_work) from [<c002c914>] (process_one_work+0x208/0x388)
[ 232.117187] [<c002c914>] (process_one_work) from [<c002d720>] (worker_thread+0x2d8/0x430)
[ 232.117187] [<c002d720>] (worker_thread) from [<c0031ce4>] (kthread+0xe0/0xf4)
[ 232.117187] [<c0031ce4>] (kthread) from [<c000a430>] (ret_from_fork+0x14/0x24)
[ 232.117187] ---[ end Kernel panic - not syncing: softlockup: hung tasks
============================
So it looks like there is something seriously wrong with your rtc. Hardware.
RTC is internal to MCU, so it might be misprogrammed, but I strongly doubt there is a real hardware issue, also because I see this problem in multiple boards.
I will cross-check initializations in both bootloader and .dts
If You have specific suggestion please feel free to advise.
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.
Thanks for the prompt answer.
Mauro
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.