| [chrony-dev] Re: Are there known issues in destructive tests on arm64 or ppc64 |
[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]
Hi Christian, On 2019-12-06T10:21+0100, Christian Ehrhardt wrote:
Hi chrony-dev, chrony is tested in Debian/Ubuntu on any update of related packages and while that was running I found that arm64 and ppc64 reproducibly hang [1][2]. OTOH arm64 and s390x reproducibly work.
^^^^^
I guess you mean amd64 here‽
Unfortunately the tests do not leave any reasonable debug output, the tests run:debian/tests/upstream-system-tests -d 1[0-9][0-9]-* Which is this script [3] that runs ./run of the system tests. The tests seem to hang without output until the environment kills them for a timeout. For now I changed in Ubuntu and suggested in Debian to disable those tests [4], but in the long run I'd want to understand what exactly is failing and how we could re-enable the tests on those platforms.
Note that this is not an issue for Debian as our CI don’t yet support running autopkgtest for tests requiring a VM, which is the case for destructives tests.
I was trying to reproduce on a Ubuntu @ ppc64 environment that should
be the same as the one used in the tests. Running the tests there just
hung as well (the system is still available if there are questions).
What I see is this:
100-clockupdate Testing update of system clock:
non-default settings:
clock_control=1
minimal_config=1
starting chronyd OK
running chronyc dfreq 1e-3 OK
checking chronyc output OK
running chronyc doffset -1.0 OK
checking chronyc output OK
running chronyc makestep OK
checking chronyc output OK
checking system clock OK
running chronyc doffset 1.0 OK
running chronyc makestep OK
stopping chronyd OK
checking chronyd messages OK
checking message "System clock was stepped by" OK
PASS
101-rtc Testing real-time clock:
non-default settings:
extra_chronyd_directives=rtcfile
/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
extra_chronyd_options=-s
minimal_config=1
starting chronyd
And here it hangs, not sure why that didn't make it out of the buffers
on the other test runs.
When I check the processes I see:
0 1000 2231 2230 20 0 12608 7616 do_wai Ss pts/0 0:00
\_ -bash
4 0 12433 2231 20 0 16576 10048 - S+ pts/0 0:00
\_ sudo debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
4 0 12434 12433 20 0 3520 1408 - S+ pts/0 0:00
\_ /bin/sh debian/tests/upstream-system-tests -d
1[0-9][0-9]-*
0 0 12435 12434 20 0 11136 5632 - S+ pts/0 0:00
\_ bash ./run -d 1[0-9][0-9]-*
0 0 12506 12435 20 0 11456 6080 - S+ pts/0 0:00
\_ bash ./101-rtc
0 0 12544 12506 20 0 6272 2624 - S+ pts/0 0:00
\_ ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chron
1 0 12545 12544 20 0 0 0 - Zs ? 0:00
\_ [chronyd] <defunct>
1 0 12546 1 20 0 6656 4032 - S ? 0:00
../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chrony
So one process goes defunct and the parent doesn't collect the exit
status - can it be so simple?
The full commandline of the parent is:
../../chronyd -x -l
Could that be due to the use of “-x”? IFAIR you have extra stuff on Ubuntu about that option.
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -u root -s
The two remaining hanging processes wait on read/select
ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12544
strace: Process 12544 attached
0.000000 read(3, ^Cstrace: Process 12544 detached
<detached ...>
ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12546
strace: Process 12546 attached
0.000000 _newselect(9, [5 6 7 8], NULL, NULL, NULL^Cstrace:
Process 12546 detached
<detached ...>
Note: due to installing the package as part of the test dependencies
there is also a chrony running via systemd service - could that be an
issue?
I have stopped the process and re-started the tests - still hanging.
Killing the hanging and defunct chrony processes lets the test
continue as assumed.
The config generated by the test is this:
sudo cat /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf
pidfile /var/run/chrony/chronyd.pid
bindcmdaddress /var/run/chrony/chronyd.sock
port 14196
cmdport 17439
allow
cmdallow
local
server 127.0.0.1 port 14196 minpoll -6 maxpoll -6
bindacqaddress 127.0.0.1
bindaddress 127.0.0.1
bindcmdaddress 127.0.0.1
dumpdir /var/run/chrony
logdir /var/log/chrony
log tempcomp rawmeasurements refclocks statistics tracking rtc
logbanner 0
smoothtime 100.0 0.001
include /dev/null
keyfile /home/ubuntu/chrony-3.5/test/system/tmp/keys
driftfile /var/lib/chrony/driftfile
tempcomp /home/ubuntu/chrony-3.5/test/system/tmp/tempcomp 0.1 0 0 0 0
I can "reenter" the broken test state by just running test 101:
~/chrony-3.5/test/system$ sudo ./run -d 101-rtc
101-rtc Testing real-time clock:
non-default settings:
extra_chronyd_directives=rtcfile
/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
extra_chronyd_options=-s
minimal_config=1
starting chronyd
<- here it hangs and I again have a defunct process in ps
It seems I can simplify this even down to:
~/chrony-3.5/test/system$ sudo ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.
conf -u root -s
This gets me
$ ps axlf | grep chrony
4 0 13136 2231 20 0 16576 10048 - S+ pts/0 0:00
\_ sudo ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/
system/tmp/chronyd.conf -u root -s
4 0 13137 13136 20 0 6272 2624 - S+ pts/0 0:00
\_ ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/s
ystem/tmp/chronyd.conf -u root -s
1 0 13138 13137 20 0 0 0 - Zs ? 0:00
\_ [chronyd] <defunct>
0 1000 13141 12570 20 0 10560 1984 pipe_w S+ pts/1 0:00
\_ grep --color=auto chrony
1 0 13139 1 20 0 6656 3968 - S ? 0:00
../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chrony
d.conf -u root -s
Further simplifying the test I found a minimal config and commandline:
Config file:
rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
RTCFile:
1 1575623140 0.0 0.0
Command:
$ ../../chronyd -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -s
With that I got an strace for you, but it is too noisy for me to read.
Strace log:
=> https://paste.ubuntu.com/p/h3yZwxmVPr/
In line 86 i see process 13324 go into 0.000008 read(3, <unfinished ...>
And on line 1233 the other process 13326 go into 0.000163
_newselect(9, [5 6 7 8], NULL, NULL, NULL <unfinished ...>
After that both seem to wait on each other.
Maybe one on this list knowing better what to expect and what might be
odd can see something in it?
I’ll try to have a closer look later.
I was wondering if any of those test hangs are known issues already? In particular what about it might be architecture dependent? As I outlined above the issue is reproducible and my system still around - so ask whatever you need to further advise what it could be about. [1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/c/chrony/20191203_150720_9e9af@/log.gz [2]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/c/chrony/20191203_142431_bdb78@/log.gz [3]: https://salsa.debian.org/debian/chrony/blob/master/debian/tests/upstream-system-tests [4]: https://salsa.debian.org/debian/chrony/merge_requests/2 -- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd
Attachment:
signature.asc
Description: PGP signature
| Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |