[chrony-dev] Chrony freezing

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


As I mentioned in a past post, I have found chrony frozen (ie no measurements
for a long time, chronyc cannot contact the running chronyd). It seems to be
related to a fix I put into chrony a few years ago (now part of 1.23) in which
chrony was not reading the rtc properly. But on some systems, due to the
complete mess that is the rtc with the new hpet system, this can sometimes
hand the system on the read of the rtc.

On the one machine that this has caused trouble with I have put in a crontab
script to test if there have been any measurements in the past half hour and
to restart chronyd if there have not been. This is clearly a kludge. I have
now put in a
   CPID=`pidof chronyd`
   gdb /usr/sbin/chronyd $CPID <<EOF
   backtrace
   quit
y
EOF
and it output
---------------------------------------------
GNU gdb 6.6-5.1mdv2008.1 (Mandriva Linux release 2008.1)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as ""...
Using host libthread_db library "/lib/i686/libthread_db.so.1".
Attaching to program: /usr/sbin/chronyd, process 19558
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0xffffe410 in __kernel_vsyscall ()
(gdb) #0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7e6d213 in read () from /lib/i686/libc.so.6
#2  0x0805ddd9 in read_from_device (any=0x0) at rtc_linux.c:877
#3  0x08049f4e in SCH_MainLoop () at sched.c:470
#4  0x0804bedd in main (argc=0, argv=0xbfb5ccb8) at main.c:357
(gdb) The program is running.  Quit anyway (and detach it)? (y or n) [answered
Y;
input not from terminal]
Detaching from program: /usr/sbin/chronyd, process 19558

----------------------------------------
rtc_linux.c:877 is exactly that second read from the rtc which I have had
trouble with in the past hanging.

Thus, this section seems to be suffering from a cleft stick. If one only does
a single read from the rtc, it interrupts immediately instead of on the second
as it should on most systems. If I do the double read, it hangs on sometimes
on some systems (this particular hang seems to be occuring after I do a
chronyc cyclelogs, but only sometimes, not always. Ie, this problem with the
rtc seems to be sporadic and flakey)

Is it possible to do a timeout on a read so that if it has not returned in a
second say, that read is abandoned?

Also it would probably be a good idea to put a nortc
keyword into /etc/chrony.conf, so that if one has one of these flakey systems,
once can switch off all rtc use for that system.

Ideally, getting the kernel people to fix rtc even with the hpet system, would
be a good idea. (the problem is that under hpet, the rtc interrupt is routed
to the non-maskable interrupt I believe, and it seems that is difficult to use for the
rtc.)
However it may be a while and chrony is still being left as flakey on the
older kernels.

Anyway, I will try again in the next few nights to see if it is really the rtc
that is causing these freezes. Any suggestions as to how I could dig in
further would be appreciated.


--
William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  www.theory.physics.ubc.ca/

---
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/