[chrony-dev] Are there known issues in destructive tests on arm64 or ppc64 |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-dev Archives
]
Hi chrony-dev,
chrony is tested in Debian/Ubuntu on any update of related packages
and while that was running I found that arm64 and ppc64 reproducibly
hang [1][2]. OTOH arm64 and s390x reproducibly work.
Unfortunately the tests do not leave any reasonable debug output, the tests run:
debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
Which is this script [3] that runs ./run of the system tests.
The tests seem to hang without output until the environment kills them
for a timeout.
For now I changed in Ubuntu and suggested in Debian to disable those
tests [4], but in the long run I'd want to understand what exactly is
failing and how we could re-enable the tests on those platforms.
I was trying to reproduce on a Ubuntu @ ppc64 environment that should
be the same as the one used in the tests. Running the tests there just
hung as well (the system is still available if there are questions).
What I see is this:
100-clockupdate Testing update of system clock:
non-default settings:
clock_control=1
minimal_config=1
starting chronyd OK
running chronyc dfreq 1e-3 OK
checking chronyc output OK
running chronyc doffset -1.0 OK
checking chronyc output OK
running chronyc makestep OK
checking chronyc output OK
checking system clock OK
running chronyc doffset 1.0 OK
running chronyc makestep OK
stopping chronyd OK
checking chronyd messages OK
checking message "System clock was stepped by" OK
PASS
101-rtc Testing real-time clock:
non-default settings:
extra_chronyd_directives=rtcfile
/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
extra_chronyd_options=-s
minimal_config=1
starting chronyd
And here it hangs, not sure why that didn't make it out of the buffers
on the other test runs.
When I check the processes I see:
0 1000 2231 2230 20 0 12608 7616 do_wai Ss pts/0 0:00
\_ -bash
4 0 12433 2231 20 0 16576 10048 - S+ pts/0 0:00
\_ sudo debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
4 0 12434 12433 20 0 3520 1408 - S+ pts/0 0:00
\_ /bin/sh debian/tests/upstream-system-tests -d
1[0-9][0-9]-*
0 0 12435 12434 20 0 11136 5632 - S+ pts/0 0:00
\_ bash ./run -d 1[0-9][0-9]-*
0 0 12506 12435 20 0 11456 6080 - S+ pts/0 0:00
\_ bash ./101-rtc
0 0 12544 12506 20 0 6272 2624 - S+ pts/0 0:00
\_ ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chron
1 0 12545 12544 20 0 0 0 - Zs ? 0:00
\_ [chronyd] <defunct>
1 0 12546 1 20 0 6656 4032 - S ? 0:00
.../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chrony
So one process goes defunct and the parent doesn't collect the exit
status - can it be so simple?
The full commandline of the parent is:
.../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -u root -s
The two remaining hanging processes wait on read/select
ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12544
strace: Process 12544 attached
0.000000 read(3, ^Cstrace: Process 12544 detached
<detached ...>
ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12546
strace: Process 12546 attached
0.000000 _newselect(9, [5 6 7 8], NULL, NULL, NULL^Cstrace:
Process 12546 detached
<detached ...>
Note: due to installing the package as part of the test dependencies
there is also a chrony running via systemd service - could that be an
issue?
I have stopped the process and re-started the tests - still hanging.
Killing the hanging and defunct chrony processes lets the test
continue as assumed.
The config generated by the test is this:
sudo cat /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf
pidfile /var/run/chrony/chronyd.pid
bindcmdaddress /var/run/chrony/chronyd.sock
port 14196
cmdport 17439
allow
cmdallow
local
server 127.0.0.1 port 14196 minpoll -6 maxpoll -6
bindacqaddress 127.0.0.1
bindaddress 127.0.0.1
bindcmdaddress 127.0.0.1
dumpdir /var/run/chrony
logdir /var/log/chrony
log tempcomp rawmeasurements refclocks statistics tracking rtc
logbanner 0
smoothtime 100.0 0.001
include /dev/null
keyfile /home/ubuntu/chrony-3.5/test/system/tmp/keys
driftfile /var/lib/chrony/driftfile
tempcomp /home/ubuntu/chrony-3.5/test/system/tmp/tempcomp 0.1 0 0 0 0
I can "reenter" the broken test state by just running test 101:
~/chrony-3.5/test/system$ sudo ./run -d 101-rtc
101-rtc Testing real-time clock:
non-default settings:
extra_chronyd_directives=rtcfile
/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
extra_chronyd_options=-s
minimal_config=1
starting chronyd
<- here it hangs and I again have a defunct process in ps
It seems I can simplify this even down to:
~/chrony-3.5/test/system$ sudo ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.
conf -u root -s
This gets me
$ ps axlf | grep chrony
4 0 13136 2231 20 0 16576 10048 - S+ pts/0 0:00
\_ sudo ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/
system/tmp/chronyd.conf -u root -s
4 0 13137 13136 20 0 6272 2624 - S+ pts/0 0:00
\_ ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/s
ystem/tmp/chronyd.conf -u root -s
1 0 13138 13137 20 0 0 0 - Zs ? 0:00
\_ [chronyd] <defunct>
0 1000 13141 12570 20 0 10560 1984 pipe_w S+ pts/1 0:00
\_ grep --color=auto chrony
1 0 13139 1 20 0 6656 3968 - S ? 0:00
.../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chrony
d.conf -u root -s
Further simplifying the test I found a minimal config and commandline:
Config file:
rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
RTCFile:
1 1575623140 0.0 0.0
Command:
$ ../../chronyd -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -s
With that I got an strace for you, but it is too noisy for me to read.
Strace log:
=> https://paste.ubuntu.com/p/h3yZwxmVPr/
In line 86 i see process 13324 go into 0.000008 read(3, <unfinished ...>
And on line 1233 the other process 13326 go into 0.000163
_newselect(9, [5 6 7 8], NULL, NULL, NULL <unfinished ...>
After that both seem to wait on each other.
Maybe one on this list knowing better what to expect and what might be
odd can see something in it?
I was wondering if any of those test hangs are known issues already?
In particular what about it might be architecture dependent?
As I outlined above the issue is reproducible and my system still
around - so ask whatever you need to further advise what it could be
about.
[1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/c/chrony/20191203_150720_9e9af@/log.gz
[2]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/c/chrony/20191203_142431_bdb78@/log.gz
[3]: https://salsa.debian.org/debian/chrony/blob/master/debian/tests/upstream-system-tests
[4]: https://salsa.debian.org/debian/chrony/merge_requests/2
--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd
--
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.