Re: [chrony-users] Chrony malfunction under ARM/Buildroot

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]



On Thu, 14 Apr 2016, Mauro Condarelli wrote:

Hi,
I'm having major problems using Chrony in an embedded system sporting an oldish Atmel AT91SAM9G25.

I am using chrony-2.3 compiled with "--without-seccomp --without-tomcrypt --without-nss" under Linux 4.4.1.

/etc/chrony.conf is:
============================
#initstepslew 30 2.it.pool.ntp.org 0.pool.ntp.org 1.pool.ntp.org

server 0.it.pool.ntp.org
server 1.it.pool.ntp.org
server 2.pool.ntp.org
server 3.pool.ntp.org

logdir /var/log/chrony
log rtc statistics measurements tracking
logchange 1
driftfile /var/lib/chrony/drift
keyfile /etc/chrony.keys
generatecommandkey
makestep 1.0 3
maxupdateskew 100.0
dumponexit
dumpdir /var/lib/chrony
rtconutc
rtcautotrim 1
rtcfile /var/lib/chrony/rtc
bindcmdaddress /var/run/chrony/chronyd.sock
============================

I have two (possibly related) malfunctions:
1) if and when the RTC is severely off (complete loss of power, including battery backup) the system comes up with a specific date (Jan, 11th 2007) and after a while (i.e.: after chrony had the time to synchronize over the network) system seem to hang with very few systems actually working (notably the system answers to "ping") so that it seems completely dead. It turns out it is not really dead, but only in "suspended animation" as it always recovers completely (with correct date & time) after a LOOOOooonggggg time (>5min).

With no rtc ( which is a hardware problem not a chrony problem) there is no
way the system can know what the time is or how to make sure everything is
always monotonic. You will just have to make sure that the problem never
occurs. I have never seen chrony "hang" the system. There is something else going on. Have you looked at all the logs-- dmesg, journalctl or var/log/messages, ...
to see what is going on? It may be that some program is noticing that chrony
is changing the time, and goes into a snit.


2) if the system comes up well (chrony seems to work as advertised) I sometimes get very bad system crashes (see below).

Neither of these conditions happen if I kill chrony early (before it has any chance to synchronize) or if it is inactive (e.g.: if system is disconnected from the internet).

What should I check?
Can someone help me debugging this?

NOTE: the need is to have a system with strictly monotonic time (i.e.: time can jump ahead, but can never jump back for any reason, even across reboots).
If necessary I can test with specific patches... or change completely.

Thanks in Advance
Mauro

============================
[ 232.117187] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:298]

OK, that has nothing to do with chrony. What are you running?

[ 232.117187] Modules linked in: cfg80211 usb_f_acm usb_f_serial libcomposite configfs u_serial atmel_usba_udc
[  232.117187] CPU: 0 PID: 298 Comm: kworker/0:1 Tainted: G W       4.4.1 #1
[  232.117187] Hardware name: Atmel AT91SAM9
[  232.117187] Workqueue: events rtc_timer_do_work

So your rtc seems to be having trouble.


[  232.117187] task: c7a6f420 ti: c799c000 task.ti: c799c000
[  232.117187] PC is at timerqueue_del+0x1c/0x84
[  232.117187] LR is at rtc_timer_do_work+0xac/0x1a8
[  232.117187] pc : [<c01c8e2c>]    lr : [<c02c43fc>] psr: 00000013
[  232.117187] sp : c799de00  ip : c799de20  fp : c799de1c
[  232.117187] r10: 00000000  r9 : c7b7ea20  r8 : c7edee00
[  232.117187] r7 : c7b7e9e0  r6 : c7b7e978  r5 : c7b7e9e0  r4 : c7b7ea20
[  232.117187] r3 : 00000001  r2 : 04405600  r1 : c7b7ea20  r0 : c7b7e9e0
[ 232.117187] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[  232.117187] Control: 0005317f  Table: 273dc000  DAC: 00000053
[  232.117187] CPU: 0 PID: 298 Comm: kworker/0:1 Tainted: G W       4.4.1 #1
[  232.117187] Hardware name: Atmel AT91SAM9
[  232.117187] Workqueue: events rtc_timer_do_work
[ 232.117187] [<c0010468>] (unwind_backtrace) from [<c000dad0>] (show_stack+0x20/0x24) [ 232.117187] [<c000dad0>] (show_stack) from [<c01c10c0>] (dump_stack+0x20/0x28) [ 232.117187] [<c01c10c0>] (dump_stack) from [<c000afa4>] (show_regs+0x1c/0x20) [ 232.117187] [<c000afa4>] (show_regs) from [<c006a748>] (watchdog_timer_fn+0x150/0x1a4) [ 232.117187] [<c006a748>] (watchdog_timer_fn) from [<c00515b8>] (__hrtimer_run_queues.constprop.5+0x140/0x21c) [ 232.117187] [<c00515b8>] (__hrtimer_run_queues.constprop.5) from [<c005181c>] (hrtimer_interrupt+0xa0/0x1c0) [ 232.117187] [<c005181c>] (hrtimer_interrupt) from [<c02e26b8>] (ch2_irq+0x30/0x38) [ 232.117187] [<c02e26b8>] (ch2_irq) from [<c00469e8>] (handle_irq_event_percpu+0x7c/0x1dc) [ 232.117187] [<c00469e8>] (handle_irq_event_percpu) from [<c0046b80>] (handle_irq_event+0x38/0x4c) [ 232.117187] [<c0046b80>] (handle_irq_event) from [<c0049a5c>] (handle_fasteoi_irq+0x9c/0x110) [ 232.117187] [<c0049a5c>] (handle_fasteoi_irq) from [<c0046188>] (generic_handle_irq+0x28/0x38) [ 232.117187] [<c0046188>] (generic_handle_irq) from [<c0046440>] (__handle_domain_irq+0x8c/0xb0) [ 232.117187] [<c0046440>] (__handle_domain_irq) from [<c000948c>] (aic_handle+0xb8/0xc0) [ 232.117187] [<c000948c>] (aic_handle) from [<c000e524>] (__irq_svc+0x44/0x58)
[  232.117187] Exception stack(0xc799ddb0 to 0xc799ddf8)
[ 232.117187] dda0: c7b7e9e0 c7b7ea20 04405600 00000001 [ 232.117187] ddc0: c7b7ea20 c7b7e9e0 c7b7e978 c7b7e9e0 c7edee00 c7b7ea20 00000000 c799de1c
[  232.117187] dde0: c799de20 c799de00 c02c43fc c01c8e2c 00000013 ffffffff
[ 232.117187] [<c000e524>] (__irq_svc) from [<c01c8e2c>] (timerqueue_del+0x1c/0x84) [ 232.117187] [<c01c8e2c>] (timerqueue_del) from [<c02c43fc>] (rtc_timer_do_work+0xac/0x1a8) [ 232.117187] [<c02c43fc>] (rtc_timer_do_work) from [<c002c914>] (process_one_work+0x208/0x388) [ 232.117187] [<c002c914>] (process_one_work) from [<c002d720>] (worker_thread+0x2d8/0x430) [ 232.117187] [<c002d720>] (worker_thread) from [<c0031ce4>] (kthread+0xe0/0xf4) [ 232.117187] [<c0031ce4>] (kthread) from [<c000a430>] (ret_from_fork+0x14/0x24)
[  232.117187] Kernel panic - not syncing: softlockup: hung tasks
[  232.117187] CPU: 0 PID: 298 Comm: kworker/0:1 Tainted: G W    L  4.4.1 #1
[  232.117187] Hardware name: Atmel AT91SAM9
[  232.117187] Workqueue: events rtc_timer_do_work
[ 232.117187] [<c0010468>] (unwind_backtrace) from [<c000dad0>] (show_stack+0x20/0x24) [ 232.117187] [<c000dad0>] (show_stack) from [<c01c10c0>] (dump_stack+0x20/0x28)
[  232.117187] [<c01c10c0>] (dump_stack) from [<c008c2b0>] (panic+0x88/0x1f0)
[ 232.117187] [<c008c2b0>] (panic) from [<c006a774>] (watchdog_timer_fn+0x17c/0x1a4) [ 232.117187] [<c006a774>] (watchdog_timer_fn) from [<c00515b8>] (__hrtimer_run_queues.constprop.5+0x140/0x21c) [ 232.117187] [<c00515b8>] (__hrtimer_run_queues.constprop.5) from [<c005181c>] (hrtimer_interrupt+0xa0/0x1c0) [ 232.117187] [<c005181c>] (hrtimer_interrupt) from [<c02e26b8>] (ch2_irq+0x30/0x38) [ 232.117187] [<c02e26b8>] (ch2_irq) from [<c00469e8>] (handle_irq_event_percpu+0x7c/0x1dc) [ 232.117187] [<c00469e8>] (handle_irq_event_percpu) from [<c0046b80>] (handle_irq_event+0x38/0x4c) [ 232.117187] [<c0046b80>] (handle_irq_event) from [<c0049a5c>] (handle_fasteoi_irq+0x9c/0x110) [ 232.117187] [<c0049a5c>] (handle_fasteoi_irq) from [<c0046188>] (generic_handle_irq+0x28/0x38) [ 232.117187] [<c0046188>] (generic_handle_irq) from [<c0046440>] (__handle_domain_irq+0x8c/0xb0) [ 232.117187] [<c0046440>] (__handle_domain_irq) from [<c000948c>] (aic_handle+0xb8/0xc0) [ 232.117187] [<c000948c>] (aic_handle) from [<c000e524>] (__irq_svc+0x44/0x58)
[  232.117187] Exception stack(0xc799ddb0 to 0xc799ddf8)
[ 232.117187] dda0: c7b7e9e0 c7b7ea20 04405600 00000001 [ 232.117187] ddc0: c7b7ea20 c7b7e9e0 c7b7e978 c7b7e9e0 c7edee00 c7b7ea20 00000000 c799de1c
[  232.117187] dde0: c799de20 c799de00 c02c43fc c01c8e2c 00000013 ffffffff
[ 232.117187] [<c000e524>] (__irq_svc) from [<c01c8e2c>] (timerqueue_del+0x1c/0x84) [ 232.117187] [<c01c8e2c>] (timerqueue_del) from [<c02c43fc>] (rtc_timer_do_work+0xac/0x1a8) [ 232.117187] [<c02c43fc>] (rtc_timer_do_work) from [<c002c914>] (process_one_work+0x208/0x388) [ 232.117187] [<c002c914>] (process_one_work) from [<c002d720>] (worker_thread+0x2d8/0x430) [ 232.117187] [<c002d720>] (worker_thread) from [<c0031ce4>] (kthread+0xe0/0xf4) [ 232.117187] [<c0031ce4>] (kthread) from [<c000a430>] (ret_from_fork+0x14/0x24)
[  232.117187] ---[ end Kernel panic - not syncing: softlockup: hung tasks
============================

So it looks like there is something seriously wrong with your rtc. Hardware.




--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/