[chrony-users] bug report: duplicate server addresses cause chronyd to core |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
Hello, I don't know if this bug has already been filed, or if this is
improper place to report it. If this is improper I apologize for the
extra spam.
I believe I have found a bug in chrony 1.27 which causes chronyd to
crash. Using the attached config file, chrony.conf, and the command
line "chronyd -n -f chrony.conf", I get the output in syslog.txt
(attached) and then a segfault.
After sorting through the code, I was able to identify what I feel to be
the issue - duplicate IP addresses in the list of ntp servers. The
backtrace in gdb.txt shows that the fault occurs in acquire.c. It shows
that a timer from timer_queue executed transmit_timeout with a pointer
to a SourceRecord which has been deleted.
I believe it became invalid by this mechanism in acquire.c:
When there are duplicate ntp servers listed on the initstepslew line, 2
SourceRecords are created (sourceA and sourceB), and two timers are
created (timerA and timerB). When ntp responses are received, only
sourceA is updated because of the way read_from_socket searches for a
matching record. Eventually, the criteria for sourceA are met, causing
timerA to stop and n_completed_sources to increment. timerB continues
to trigger, sending ntp poll messages to the ntp server. Responses from
that server are assigned to sourceA, triggering the criteria for sourceA
and causing n_completed_sources to increment improperly. Once this
happens enough times, n_complete_sources == number of servers and all
SourceRecords are deleted. The next time timerB triggers, it attempts
to access sourceB, which was already been deleted, causing the core.
Attached is a patch I used to prevent duplicate IP addresses in the
acquisition list. Using this patch, I no longer see the segfault.
thanks for your time
-victor
rtcsync
allow 0/0
makestep 0.1 30
cmdallow 127.0.0.1
# initstepslew 0 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org
# server 0.pool.ntp.org iburst minpoll 4 maxpoll 4
# server 1.pool.ntp.org iburst minpoll 4 maxpoll 4
# server 2.pool.ntp.org iburst minpoll 4 maxpoll 4
# server 3.pool.ntp.org iburst minpoll 4 maxpoll 4
# initstepslew 0 174.36.71.205
initstepslew 0 174.36.71.205 50.18.44.198 208.53.158.34 174.36.71.205
server 174.36.71.205 iburst minpoll 4 maxpoll 4
server 50.18.44.198 iburst minpoll 4 maxpoll 4
server 208.53.158.34 iburst minpoll 4 maxpoll 4
Apr 24 12:28:17 chronyd[20696]: chronyd version 1.27-pre1 starting
Apr 24 12:28:17 chronyd[20696]: Linux kernel major=3 minor=2 patch=0
Apr 24 12:28:17 chronyd[20696]: hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833 max_tick_bias=1000 shift_pll=2
Apr 24 12:28:17 chronyd[20696]: Initial frequency 13.225 ppm
Apr 24 12:28:17 chronyd[20696]: System's initial offset : 0.000009 seconds fast of true (step)
(gdb) bt
#0 0x4018dc90 in *__GI_raise (sig=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#1 0x40193884 in *__GI_abort () at abort.c:92
#2 0x4018646c in *__GI___assert_fail (assertion=<value optimized out>,
file=<value optimized out>, line=<value optimized out>,
function=<value optimized out>) at assert.c:81
#3 0x00020c64 in probe_source (src=0x1) at acquire.c:303
#4 0x0000b014 in dispatch_timeouts () at sched.c:463
#5 SCH_MainLoop () at sched.c:582
#6 0x0000fa24 in main (argc=<value optimized out>, argv=0xbe9d6e80)
at main.c:429
(gdb) p init_srcs_ip
$1 = {{
addr = {
in4 = 2921613261,
in6 = "\315G$\256\001\000\000\000\000\000\000\000\204d\235\276"
},
family = 1
}, {
addr = {
in4 = 840051910,
in6 = "\306,\022\062\001\000\000\000\000\000\000\000\204d\235\276"
},
family = 1
}, {
addr = {
in4 = 3493174818,
in6 = "\"\236\065\320\001\000\000\000\000\000\000\000\204d\235\276"
},
family = 1
}, {
addr = {
in4 = 2921613261,
in6 = "\315G$\256\001\000\000\000\000\000\000\000\204d\235\276"
},
family = 1
}, {
addr = {
in4 = 0,
in6 = '\000' <repeats 15 times>
},
family = 0
}, {
addr = {
in4 = 0,
in6 = '\000' <repeats 15 times>
},
family = 0
}, {
addr = {
in4 = 0,
in6 = '\000' <repeats 15 times>
},
family = 0
}, {
addr = {
in4 = 0,
in6 = '\000' <repeats 15 times>
},
family = 0
}}
diff -up chrony-1.27/acquire.c.orig chrony-1.27/acquire.c
--- src/acquire.c.orig 2013-04-24 13:51:38.000000000 -0400
+++ src/acquire.c 2013-04-24 13:52:37.000000000 -0400
@@ -757,6 +757,7 @@ ACQ_StartAcquisition(int n, IPAddr *ip_a
{
int i, ip4, ip6;
+ int k, duplicate_ip;
saved_after_hook = after_hook;
saved_after_hook_anything = anything;
@@ -765,18 +766,32 @@ ACQ_StartAcquisition(int n, IPAddr *ip_a
n_started_sources = 0;
n_completed_sources = 0;
- n_sources = n;
+ n_sources = 0;
sources = MallocArray(SourceRecord, n);
for (i = ip4 = ip6 = 0; i < n; i++) {
- sources[i].ip_addr = ip_addrs[i];
- sources[i].n_samples = 0;
- sources[i].n_total_samples = 0;
- sources[i].n_dead_probes = 0;
- if (ip_addrs[i].family == IPADDR_INET4)
- ip4++;
- else if (ip_addrs[i].family == IPADDR_INET6)
- ip6++;
+ /* check for duplicate IP addresses and ignore them */
+ duplicate_ip = 0;
+ for (k = 0; k < i; k++)
+ {
+ duplicate_ip |= UTI_CompareIPs(&(sources[k].ip_addr),
+ &ip_addrs[i],
+ NULL) == 0;
+ }
+ if (!duplicate_ip) {
+ sources[n_sources].ip_addr = ip_addrs[i];
+ sources[n_sources].n_samples = 0;
+ sources[n_sources].n_total_samples = 0;
+ sources[n_sources].n_dead_probes = 0;
+ if (ip_addrs[i].family == IPADDR_INET4)
+ ip4++;
+ else if (ip_addrs[i].family == IPADDR_INET6)
+ ip6++;
+ n_sources++;
+ } else {
+ LOG(LOGS_WARN, LOGF_Acquire, "Ignoring duplicate source: %s",
+ UTI_IPToString(&ip_addrs[i]));
+ }
}
initialise_io((ip4 && ip6) ? IPADDR_UNSPEC : (ip4 ? IPADDR_INET4 : IPADDR_INET6));