[chrony-dev] Re: Are there known issues in destructive tests on arm64 or ppc64

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


On Fri, Dec 6, 2019 at 11:32 AM Vincent Blut <vincent.debian@xxxxxxx> wrote:
>
> Hi Christian,
>
> On 2019-12-06T10:21+0100, Christian Ehrhardt wrote:
> >Hi chrony-dev,
> >chrony is tested in Debian/Ubuntu on any update of related packages
> >and while that was running I found that arm64 and ppc64 reproducibly
> >hang [1][2]. OTOH arm64 and s390x reproducibly work.
>                    ^^^^^
> I guess you mean amd64 here‽
>
> >Unfortunately the tests do not leave any reasonable debug output, the
> >tests run:
> >    debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
> >Which is this script [3] that runs ./run of the system tests.
> >The tests seem to hang without output until the environment kills them
> >for a timeout.
> >
> >For now I changed in Ubuntu and suggested in Debian to disable those
> >tests [4], but in the long run I'd want to understand what exactly is
> >failing and how we could re-enable the tests on those platforms.
>
> Note that this is not an issue for Debian as our CI don’t yet support
> running autopkgtest for tests requiring a VM, which is the case for
> destructives tests.

Yeah I know it doesn't "hit" you on debCI I mentioned that in the MR
that I sent to Debian.
But I'd expect it to be no different.

> >I was trying to reproduce on a Ubuntu @ ppc64 environment that should
> >be the same as the one used in the tests. Running the tests there just
> >hung as well (the system is still available if there are questions).
> >What I see is this:
> >100-clockupdate   Testing update of system clock:
> > non-default settings:
> >   clock_control=1
> >   minimal_config=1
> > starting chronyd                                              OK
> > running chronyc dfreq 1e-3                                    OK
> > checking chronyc output                                       OK
> > running chronyc doffset -1.0                                  OK
> > checking chronyc output                                       OK
> > running chronyc makestep                                      OK
> > checking chronyc output                                       OK
> > checking system clock                                         OK
> > running chronyc doffset 1.0                                   OK
> > running chronyc makestep                                      OK
> > stopping chronyd                                              OK
> > checking chronyd messages                                     OK
> > checking message "System clock was stepped by"                OK
> >PASS
> >
> >101-rtc   Testing real-time clock:
> > non-default settings:
> >   extra_chronyd_directives=rtcfile
> >/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
> >   extra_chronyd_options=-s
> >   minimal_config=1
> > starting chronyd
> >
> >
> >And here it hangs, not sure why that didn't make it out of the buffers
> >on the other test runs.
> >When I check the processes I see:
> >0  1000  2231  2230  20   0  12608  7616 do_wai Ss   pts/0      0:00
> >   \_ -bash
> >4     0 12433  2231  20   0  16576 10048 -      S+   pts/0      0:00
> >       \_ sudo debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
> >4     0 12434 12433  20   0   3520  1408 -      S+   pts/0      0:00
> >           \_ /bin/sh debian/tests/upstream-system-tests -d
> >1[0-9][0-9]-*
> >0     0 12435 12434  20   0  11136  5632 -      S+   pts/0      0:00
> >               \_ bash ./run -d 1[0-9][0-9]-*
> >0     0 12506 12435  20   0  11456  6080 -      S+   pts/0      0:00
> >                   \_ bash ./101-rtc
> >0     0 12544 12506  20   0   6272  2624 -      S+   pts/0      0:00
> >                       \_ ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chron
> >1     0 12545 12544  20   0      0     0 -      Zs   ?          0:00
> >                           \_ [chronyd] <defunct>
> >1     0 12546     1  20   0   6656  4032 -      S    ?          0:00
> >../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chrony
> >
> >So one process goes defunct and the parent doesn't collect the exit
> >status - can it be so simple?
> >
> >The full commandline of the parent is:
> >../../chronyd -x -l
>
> Could that be due to the use of “-x”? IFAIR you have extra stuff on
> Ubuntu about that option.

I was testing the current master tarball from upstream to avoid any confusion.
=> https://git.tuxfamily.org/chrony/chrony.git/snapshot/chrony-master.tar..gz

That exposes the same behavior, so none of the Debian/Ubuntu delta
seems to matter for this problem.

> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -u root -s
> >
> >The two remaining hanging processes wait on read/select
> >ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12544
> >strace: Process 12544 attached
> >    0.000000 read(3, ^Cstrace: Process 12544 detached
> ><detached ...>
> >ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12546
> >strace: Process 12546 attached
> >    0.000000 _newselect(9, [5 6 7 8], NULL, NULL, NULL^Cstrace:
> >Process 12546 detached
> ><detached ...>
> >
> >Note: due to installing the package as part of the test dependencies
> >there is also a chrony running via systemd service - could that be an
> >issue?
> >I have stopped the process and re-started the tests - still hanging.
> >
> >Killing the hanging and defunct chrony processes lets the test
> >continue as assumed.
> >
> >The config generated by the test is this:
> >sudo cat /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf
> >pidfile /var/run/chrony/chronyd.pid
> >bindcmdaddress /var/run/chrony/chronyd.sock
> >port 14196
> >cmdport 17439
> >allow
> >cmdallow
> >local
> >server 127.0.0.1 port 14196 minpoll -6 maxpoll -6
> >bindacqaddress 127.0.0.1
> >bindaddress 127.0.0.1
> >bindcmdaddress 127.0.0.1
> >dumpdir /var/run/chrony
> >logdir /var/log/chrony
> >log tempcomp rawmeasurements refclocks statistics tracking rtc
> >logbanner 0
> >smoothtime 100.0 0.001
> >include /dev/null
> >keyfile /home/ubuntu/chrony-3.5/test/system/tmp/keys
> >driftfile /var/lib/chrony/driftfile
> >tempcomp /home/ubuntu/chrony-3.5/test/system/tmp/tempcomp 0.1 0 0 0 0
> >
> >I can "reenter" the broken test state by just running test 101:
> > ~/chrony-3.5/test/system$ sudo ./run -d 101-rtc
> >101-rtc   Testing real-time clock:
> > non-default settings:
> >   extra_chronyd_directives=rtcfile
> >/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
> >   extra_chronyd_options=-s
> >   minimal_config=1
> > starting chronyd
> ><- here it hangs and I again have a defunct process in ps
> >
> >It seems I can simplify this even down to:
> >~/chrony-3.5/test/system$ sudo ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.
> >conf -u root -s
> >
> >This gets me
> >$ ps axlf | grep chrony
> >4     0 13136  2231  20   0  16576 10048 -      S+   pts/0      0:00
> >       \_ sudo ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/
> >system/tmp/chronyd.conf -u root -s
> >4     0 13137 13136  20   0   6272  2624 -      S+   pts/0      0:00
> >           \_ ../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/s
> >ystem/tmp/chronyd.conf -u root -s
> >1     0 13138 13137  20   0      0     0 -      Zs   ?          0:00
> >               \_ [chronyd] <defunct>
> >0  1000 13141 12570  20   0  10560  1984 pipe_w S+   pts/1      0:00
> >           \_ grep --color=auto chrony
> >1     0 13139     1  20   0   6656  3968 -      S    ?          0:00
> >../../chronyd -x -l
> >/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
> >/home/ubuntu/chrony-3.5/test/system/tmp/chrony
> >d.conf -u root -s
> >
> >Further simplifying the test I found a minimal config and commandline:
> >Config file:
> >rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
> >RTCFile:
> >1 1575623140 0.0 0.0
> >Command:
> >$ ../../chronyd  -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -s
> >
> >With that I got an strace for you, but it is too noisy for me to read.
> >Strace log:
> >=> https://paste.ubuntu.com/p/h3yZwxmVPr/
> >In line 86 i see process 13324 go into 0.000008 read(3,  <unfinished ...>
> >And on line 1233 the other process 13326 go into 0.000163
> >_newselect(9, [5 6 7 8], NULL, NULL, NULL <unfinished ...>
> >After that both seem to wait on each other.
> >Maybe one on this list knowing better what to expect and what might be
> >odd can see something in it?
>
> I’ll try to have a closer look later.
>
> >I was wondering if any of those test hangs are known issues already?
> >In particular what about it might be architecture dependent?
> >
> >As I outlined above the issue is reproducible and my system still
> >around - so ask whatever you need to further advise what it could be
> >about.
> >
> >[1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/c/chrony/20191203_150720_9e9af@/log.gz
> >[2]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/c/chrony/20191203_142431_bdb78@/log.gz
> >[3]: https://salsa.debian.org/debian/chrony/blob/master/debian/tests/upstream-system-tests
> >[4]: https://salsa.debian.org/debian/chrony/merge_requests/2
> >
> >--
> >Christian Ehrhardt
> >Staff Engineer, Ubuntu Server
> >Canonical Ltd



-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd

--
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/