Re: [chrony-users] Chronyd unexpected abort after server was set to "online"

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Mon, Jun 16, 2014 at 11:34:16AM +0200, Arndt Kritzner wrote:
> Hi,
> 
> we are using chrony on a device with AVR32 CPU since a while and this seemed to work. Bu today I checked the function
> and experienced, that chronyd always aborts, after servers became online. Chronyd start looks normal:
> 
> ~ # chronyd -R -d
> main.c:355:(main)[13-13:50:51] chronyd version 1.29 starting
> sys_linux.c:1022:(get_version_specific_details)[13-13:50:51] Linux kernel major=3 minor=4 patch=77
> sys_linux.c:1080:(get_version_specific_details)[13-13:50:51] hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000
> slew_delta_tick=833 max_tick_bias=1000 shift_pll=2
> 
> But after setting servers to "online" through chronyc chronyd closes:
> 
> ntp_core.c:1575:(NCR_TakeSourceOnline)[13-13:51:42] Source 176.9.1.148 online
> chronyd: sourcestats.c: 345: find_best_sample_index: Assertion `elapsed >= 0.0' failed.
> Aborted
> ~ #

Hm, that's interesting. Can you get a backtrace for this crash or get
chronyd output with this patch, so we can see the value of elapsed and
the number of samples?

--- a/sourcestats.c
+++ b/sourcestats.c
@@ -342,6 +342,7 @@ find_best_sample_index(SST_Stats inst, double *times_back)
     j = get_buf_index(inst, i);
 
     elapsed = -times_back[i];
+    LOG(LOGS_INFO, LOGF_SourceStats, "n=%d i=%d best=%d elapsed=%e", inst->n_samples, i, best_index, elapsed);
     assert(elapsed >= 0.0);
 
     root_distance = inst->root_dispersions[j] + elapsed * inst->skew + 0.5 * inst->root_delays[j];

I guess this is also present in the latest 1.30-pre1, it might help us
to see the complete output of "chronyd -d -d" when compiled with
--enable-debug.

> /etc/chrony.conf:
> server pool.ntp.org offline
> refclock SHM 0 offset 0.0 delay 0.2 refid GPS
> refclock SHM 1 offset 0.0 delay 0.0 refid PPS
> refclock SOCK /tmp/chrony.ttyS3.sock
> driftfile /etc/chrony.drift
> keyfile /etc/chrony.keys
> commandkey 1
> makestep 1000 10
> initstepslew 30 pool.ntp.org

The config looks ok to me.

> Any explanations for this behaviour? And any clue to solve it? The internet connection does not exist permanently and
> switches between LAN and cellphone network. That's the reason we use "offline"/"online" switching.

I think in older versions this could happen when something other than
chronyd stepped the system clock back and an "out of order" sample was
accumulated in chronyd. A check for that was added in 1.27, so it
shouldn't happen with 1.29. This looks like another bug and it needs
to be fixed.

Thanks for the report.

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/