[chrony-dev] Re: Are there known issues in destructive tests on arm64 or ppc64 |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-dev Archives
]
On Fri, Dec 6, 2019 at 11:32 AM Vincent Blut <vincent.debian@xxxxxxx> wrote:
>
> Hi Christian,
>
> On 2019-12-06T10:21+0100, Christian Ehrhardt wrote:
> >Hi chrony-dev,
> >chrony is tested in Debian/Ubuntu on any update of related packages
> >and while that was running I found that arm64 and ppc64 reproducibly
> >hang [1][2]. OTOH arm64 and s390x reproducibly work.
> ^^^^^
> I guess you mean amd64 here‽
>
> >Unfortunately the tests do not leave any reasonable debug output, the
> >tests run:
> > debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
> >Which is this script [3] that runs ./run of the system tests.
> >The tests seem to hang without output until the environment kills them
> >for a timeout.
> >
> >For now I changed in Ubuntu and suggested in Debian to disable those
> >tests [4], but in the long run I'd want to understand what exactly is
> >failing and how we could re-enable the tests on those platforms.
>
> Note that this is not an issue for Debian as our CI don’t yet support
> running autopkgtest for tests requiring a VM, which is the case for
> destructives tests.
Yeah I know it doesn't "hit" you on debCI I mentioned that in the MR
that I sent to Debian.
But I'd expect it to be no different.
> >I was trying to reproduce on a Ubuntu @ ppc64 environment that should
> >be the same as the one used in the tests. Running the tests there just
> >hung as well (the system is still available if there are questions).
> >What I see is this:
> >100-clockupdate Testing update of system clock:
> > non-default settings:
> > clock_control=1
> > minimal_config=1
> > starting chronyd OK
> > running chronyc dfreq 1e-3 OK
> > checking chronyc output OK
> > running chronyc doffset -1.0 OK
> > checking chronyc output OK
> > running chronyc makestep OK
> > checking chronyc output OK
> > checking system clock OK
> > running chronyc doffset 1.0 OK
> > running chronyc makestep OK
> > stopping chronyd OK
> > checking chronyd messages OK
> > checking message "System clock was stepped by" OK
> >PASS
> >
> >101-rtc Testing real-time clock:
> > non-default settings:
> > extra_chronyd_directives=rtcfile
> >/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
> > extra_chronyd_options=-s
> > minimal_config=1
> > starting chronyd
> >
> >
> >And here it hangs, not sure why that didn't make it out of the buffers
> >on the other test runs.
> >When I check the processes I see:
> >0 1000 2231 2230 20 0 12608 7616 do_wai Ss pts/0 0:00
> > \_ -bash
> >4 0 12433 2231 20 0 16576 10048 - S+ pts/0 0:00
> > \_ sudo debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
> >4 0 12434 12433 20 0 3520 1408 - S+ pts/0 0:00
> > \_ /bin/sh debian/tests/upstream-system-tests -d
> >1[0-9][0-9]-*
> >0 0 12435 12434 20 0 11136 5632 - S+ pts/0 0:00
> > \_ bash ./run -d 1[0-9][0-9]-*
> >0 0 12506 12435 20 0 11456 6080 - S+ pts/0 0:00
> > \_ bash ./101-rtc
> >0 0 12544 12506 20 0 6272 2624 - S+ pts/0 0:00
> > \_ ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chron
> >1 0 12545 12544 20 0 0 0 - Zs ? 0:00
> > \_ [chronyd] <defunct>
> >1 0 12546 1 20 0 6656 4032 - S ? 0:00
> >../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chrony
> >
> >So one process goes defunct and the parent doesn't collect the exit
> >status - can it be so simple?
> >
> >The full commandline of the parent is:
> >../../chronyd -x -l
>
> Could that be due to the use of “-x”? IFAIR you have extra stuff on
> Ubuntu about that option.
I was testing the current master tarball from upstream to avoid any confusion.
=> https://git.tuxfamily.org/chrony/chrony.git/snapshot/chrony-master.tar..gz
That exposes the same behavior, so none of the Debian/Ubuntu delta
seems to matter for this problem.
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -u root -s
> >
> >The two remaining hanging processes wait on read/select
> >ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12544
> >strace: Process 12544 attached
> > 0.000000 read(3, ^Cstrace: Process 12544 detached
> ><detached ...>
> >ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12546
> >strace: Process 12546 attached
> > 0.000000 _newselect(9, [5 6 7 8], NULL, NULL, NULL^Cstrace:
> >Process 12546 detached
> ><detached ...>
> >
> >Note: due to installing the package as part of the test dependencies
> >there is also a chrony running via systemd service - could that be an
> >issue?
> >I have stopped the process and re-started the tests - still hanging.
> >
> >Killing the hanging and defunct chrony processes lets the test
> >continue as assumed.
> >
> >The config generated by the test is this:
> >sudo cat /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf
> >pidfile /var/run/chrony/chronyd.pid
> >bindcmdaddress /var/run/chrony/chronyd.sock
> >port 14196
> >cmdport 17439
> >allow
> >cmdallow
> >local
> >server 127.0.0.1 port 14196 minpoll -6 maxpoll -6
> >bindacqaddress 127.0.0.1
> >bindaddress 127.0.0.1
> >bindcmdaddress 127.0.0.1
> >dumpdir /var/run/chrony
> >logdir /var/log/chrony
> >log tempcomp rawmeasurements refclocks statistics tracking rtc
> >logbanner 0
> >smoothtime 100.0 0.001
> >include /dev/null
> >keyfile /home/ubuntu/chrony-3.5/test/system/tmp/keys
> >driftfile /var/lib/chrony/driftfile
> >tempcomp /home/ubuntu/chrony-3.5/test/system/tmp/tempcomp 0.1 0 0 0 0
> >
> >I can "reenter" the broken test state by just running test 101:
> > ~/chrony-3.5/test/system$ sudo ./run -d 101-rtc
> >101-rtc Testing real-time clock:
> > non-default settings:
> > extra_chronyd_directives=rtcfile
> >/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
> > extra_chronyd_options=-s
> > minimal_config=1
> > starting chronyd
> ><- here it hangs and I again have a defunct process in ps
> >
> >It seems I can simplify this even down to:
> >~/chrony-3.5/test/system$ sudo ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.
> >conf -u root -s
> >
> >This gets me
> >$ ps axlf | grep chrony
> >4 0 13136 2231 20 0 16576 10048 - S+ pts/0 0:00
> > \_ sudo ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/
> >system/tmp/chronyd.conf -u root -s
> >4 0 13137 13136 20 0 6272 2624 - S+ pts/0 0:00
> > \_ ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/s
> >ystem/tmp/chronyd.conf -u root -s
> >1 0 13138 13137 20 0 0 0 - Zs ? 0:00
> > \_ [chronyd] <defunct>
> >0 1000 13141 12570 20 0 10560 1984 pipe_w S+ pts/1 0:00
> > \_ grep --color=auto chrony
> >1 0 13139 1 20 0 6656 3968 - S ? 0:00
> >../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chrony
> >d.conf -u root -s
> >
> >Further simplifying the test I found a minimal config and commandline:
> >Config file:
> >rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
> >RTCFile:
> >1 1575623140 0.0 0.0
> >Command:
> >$ ../../chronyd -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -s
> >
> >With that I got an strace for you, but it is too noisy for me to read.
> >Strace log:
> >=> https://paste.ubuntu.com/p/h3yZwxmVPr/
> >In line 86 i see process 13324 go into 0.000008 read(3, <unfinished ...>
> >And on line 1233 the other process 13326 go into 0.000163
> >_newselect(9, [5 6 7 8], NULL, NULL, NULL <unfinished ...>
> >After that both seem to wait on each other.
> >Maybe one on this list knowing better what to expect and what might be
> >odd can see something in it?
>
> I’ll try to have a closer look later.
>
> >I was wondering if any of those test hangs are known issues already?
> >In particular what about it might be architecture dependent?
> >
> >As I outlined above the issue is reproducible and my system still
> >around - so ask whatever you need to further advise what it could be
> >about.
> >
> >[1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/c/chrony/20191203_150720_9e9af@/log.gz
> >[2]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/c/chrony/20191203_142431_bdb78@/log.gz
> >[3]: https://salsa.debian.org/debian/chrony/blob/master/debian/tests/upstream-system-tests
> >[4]: https://salsa.debian.org/debian/chrony/merge_requests/2
> >
> >--
> >Christian Ehrhardt
> >Staff Engineer, Ubuntu Server
> >Canonical Ltd
--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd
--
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.