Hello, I don't know if this bug has already been filed, or if this is
improper place to report it. If this is improper I apologize for the
extra spam.
I believe I have found a bug in chrony 1.27 which causes chronyd to
crash. Using the attached config file, chrony.conf, and the command
line "chronyd -n -f chrony.conf", I get the output in syslog.txt
(attached) and then a segfault.
After sorting through the code, I was able to identify what I feel to be
the issue - duplicate IP addresses in the list of ntp servers. The
backtrace in gdb.txt shows that the fault occurs in acquire.c. It shows
that a timer from timer_queue executed transmit_timeout with a pointer
to a SourceRecord which has been deleted.
I believe it became invalid by this mechanism in acquire.c:
When there are duplicate ntp servers listed on the initstepslew line, 2
SourceRecords are created (sourceA and sourceB), and two timers are
created (timerA and timerB). When ntp responses are received, only
sourceA is updated because of the way read_from_socket searches for a
matching record. Eventually, the criteria for sourceA are met, causing
timerA to stop and n_completed_sources to increment. timerB continues
to trigger, sending ntp poll messages to the ntp server. Responses from
that server are assigned to sourceA, triggering the criteria for sourceA
and causing n_completed_sources to increment improperly. Once this
happens enough times, n_complete_sources == number of servers and all
SourceRecords are deleted. The next time timerB triggers, it attempts
to access sourceB, which was already been deleted, causing the core.
Attached is a patch I used to prevent duplicate IP addresses in the
acquisition list. Using this patch, I no longer see the segfault.
thanks for your time
-victor