[chrony-dev] Re: Are there known issues in destructive tests on arm64 or ppc64 |
[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]
Hi Christian, On 2019-12-06T10:21+0100, Christian Ehrhardt wrote:
Hi chrony-dev, chrony is tested in Debian/Ubuntu on any update of related packages and while that was running I found that arm64 and ppc64 reproducibly hang [1][2]. OTOH arm64 and s390x reproducibly work.
^^^^^ I guess you mean amd64 here‽
Unfortunately the tests do not leave any reasonable debug output, the tests run:debian/tests/upstream-system-tests -d 1[0-9][0-9]-* Which is this script [3] that runs ./run of the system tests. The tests seem to hang without output until the environment kills them for a timeout. For now I changed in Ubuntu and suggested in Debian to disable those tests [4], but in the long run I'd want to understand what exactly is failing and how we could re-enable the tests on those platforms.
Note that this is not an issue for Debian as our CI don’t yet support running autopkgtest for tests requiring a VM, which is the case for destructives tests.
I was trying to reproduce on a Ubuntu @ ppc64 environment that should be the same as the one used in the tests. Running the tests there just hung as well (the system is still available if there are questions). What I see is this: 100-clockupdate Testing update of system clock: non-default settings: clock_control=1 minimal_config=1 starting chronyd OK running chronyc dfreq 1e-3 OK checking chronyc output OK running chronyc doffset -1.0 OK checking chronyc output OK running chronyc makestep OK checking chronyc output OK checking system clock OK running chronyc doffset 1.0 OK running chronyc makestep OK stopping chronyd OK checking chronyd messages OK checking message "System clock was stepped by" OK PASS 101-rtc Testing real-time clock: non-default settings: extra_chronyd_directives=rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile extra_chronyd_options=-s minimal_config=1 starting chronyd And here it hangs, not sure why that didn't make it out of the buffers on the other test runs. When I check the processes I see: 0 1000 2231 2230 20 0 12608 7616 do_wai Ss pts/0 0:00 \_ -bash 4 0 12433 2231 20 0 16576 10048 - S+ pts/0 0:00 \_ sudo debian/tests/upstream-system-tests -d 1[0-9][0-9]-* 4 0 12434 12433 20 0 3520 1408 - S+ pts/0 0:00 \_ /bin/sh debian/tests/upstream-system-tests -d 1[0-9][0-9]-* 0 0 12435 12434 20 0 11136 5632 - S+ pts/0 0:00 \_ bash ./run -d 1[0-9][0-9]-* 0 0 12506 12435 20 0 11456 6080 - S+ pts/0 0:00 \_ bash ./101-rtc 0 0 12544 12506 20 0 6272 2624 - S+ pts/0 0:00 \_ ../../chronyd -x -l /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chron 1 0 12545 12544 20 0 0 0 - Zs ? 0:00 \_ [chronyd] <defunct> 1 0 12546 1 20 0 6656 4032 - S ? 0:00 ../../chronyd -x -l /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chrony-3.5/test/system/tmp/chrony So one process goes defunct and the parent doesn't collect the exit status - can it be so simple? The full commandline of the parent is: ../../chronyd -x -l
Could that be due to the use of “-x”? IFAIR you have extra stuff on Ubuntu about that option.
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -u root -s The two remaining hanging processes wait on read/select ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12544 strace: Process 12544 attached 0.000000 read(3, ^Cstrace: Process 12544 detached <detached ...> ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12546 strace: Process 12546 attached 0.000000 _newselect(9, [5 6 7 8], NULL, NULL, NULL^Cstrace: Process 12546 detached <detached ...> Note: due to installing the package as part of the test dependencies there is also a chrony running via systemd service - could that be an issue? I have stopped the process and re-started the tests - still hanging. Killing the hanging and defunct chrony processes lets the test continue as assumed. The config generated by the test is this: sudo cat /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf pidfile /var/run/chrony/chronyd.pid bindcmdaddress /var/run/chrony/chronyd.sock port 14196 cmdport 17439 allow cmdallow local server 127.0.0.1 port 14196 minpoll -6 maxpoll -6 bindacqaddress 127.0.0.1 bindaddress 127.0.0.1 bindcmdaddress 127.0.0.1 dumpdir /var/run/chrony logdir /var/log/chrony log tempcomp rawmeasurements refclocks statistics tracking rtc logbanner 0 smoothtime 100.0 0.001 include /dev/null keyfile /home/ubuntu/chrony-3.5/test/system/tmp/keys driftfile /var/lib/chrony/driftfile tempcomp /home/ubuntu/chrony-3.5/test/system/tmp/tempcomp 0.1 0 0 0 0 I can "reenter" the broken test state by just running test 101: ~/chrony-3.5/test/system$ sudo ./run -d 101-rtc 101-rtc Testing real-time clock: non-default settings: extra_chronyd_directives=rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile extra_chronyd_options=-s minimal_config=1 starting chronyd <- here it hangs and I again have a defunct process in ps It seems I can simplify this even down to: ~/chrony-3.5/test/system$ sudo ../../chronyd -x -l /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd. conf -u root -s This gets me $ ps axlf | grep chrony 4 0 13136 2231 20 0 16576 10048 - S+ pts/0 0:00 \_ sudo ../../chronyd -x -l /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chrony-3.5/test/ system/tmp/chronyd.conf -u root -s 4 0 13137 13136 20 0 6272 2624 - S+ pts/0 0:00 \_ ../../chronyd -x -l /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chrony-3.5/test/s ystem/tmp/chronyd.conf -u root -s 1 0 13138 13137 20 0 0 0 - Zs ? 0:00 \_ [chronyd] <defunct> 0 1000 13141 12570 20 0 10560 1984 pipe_w S+ pts/1 0:00 \_ grep --color=auto chrony 1 0 13139 1 20 0 6656 3968 - S ? 0:00 ../../chronyd -x -l /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f /home/ubuntu/chrony-3.5/test/system/tmp/chrony d.conf -u root -s Further simplifying the test I found a minimal config and commandline: Config file: rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile RTCFile: 1 1575623140 0.0 0.0 Command: $ ../../chronyd -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -s With that I got an strace for you, but it is too noisy for me to read. Strace log: => https://paste.ubuntu.com/p/h3yZwxmVPr/ In line 86 i see process 13324 go into 0.000008 read(3, <unfinished ...> And on line 1233 the other process 13326 go into 0.000163 _newselect(9, [5 6 7 8], NULL, NULL, NULL <unfinished ...> After that both seem to wait on each other. Maybe one on this list knowing better what to expect and what might be odd can see something in it?
I’ll try to have a closer look later.
I was wondering if any of those test hangs are known issues already? In particular what about it might be architecture dependent? As I outlined above the issue is reproducible and my system still around - so ask whatever you need to further advise what it could be about. [1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/c/chrony/20191203_150720_9e9af@/log.gz [2]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/c/chrony/20191203_142431_bdb78@/log.gz [3]: https://salsa.debian.org/debian/chrony/blob/master/debian/tests/upstream-system-tests [4]: https://salsa.debian.org/debian/chrony/merge_requests/2 -- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd
Attachment:
signature.asc
Description: PGP signature
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |