[chrony-dev] Re: Are there known issues in destructive tests on arm64 or ppc64

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


Hi Christian,

On 2019-12-06T10:21+0100, Christian Ehrhardt wrote:
Hi chrony-dev,
chrony is tested in Debian/Ubuntu on any update of related packages
and while that was running I found that arm64 and ppc64 reproducibly
hang [1][2]. OTOH arm64 and s390x reproducibly work.
                  ^^^^^
I guess you mean amd64 here‽

Unfortunately the tests do not leave any reasonable debug output, the tests run:
   debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
Which is this script [3] that runs ./run of the system tests.
The tests seem to hang without output until the environment kills them
for a timeout.

For now I changed in Ubuntu and suggested in Debian to disable those
tests [4], but in the long run I'd want to understand what exactly is
failing and how we could re-enable the tests on those platforms.

Note that this is not an issue for Debian as our CI don’t yet support running autopkgtest for tests requiring a VM, which is the case for destructives tests.

I was trying to reproduce on a Ubuntu @ ppc64 environment that should
be the same as the one used in the tests. Running the tests there just
hung as well (the system is still available if there are questions).
What I see is this:
100-clockupdate   Testing update of system clock:
non-default settings:
  clock_control=1
  minimal_config=1
starting chronyd                                              OK
running chronyc dfreq 1e-3                                    OK
checking chronyc output                                       OK
running chronyc doffset -1.0                                  OK
checking chronyc output                                       OK
running chronyc makestep                                      OK
checking chronyc output                                       OK
checking system clock                                         OK
running chronyc doffset 1.0                                   OK
running chronyc makestep                                      OK
stopping chronyd                                              OK
checking chronyd messages                                     OK
checking message "System clock was stepped by"                OK
PASS

101-rtc   Testing real-time clock:
non-default settings:
  extra_chronyd_directives=rtcfile
/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
  extra_chronyd_options=-s
  minimal_config=1
starting chronyd


And here it hangs, not sure why that didn't make it out of the buffers
on the other test runs.
When I check the processes I see:
0  1000  2231  2230  20   0  12608  7616 do_wai Ss   pts/0      0:00
  \_ -bash
4     0 12433  2231  20   0  16576 10048 -      S+   pts/0      0:00
      \_ sudo debian/tests/upstream-system-tests -d 1[0-9][0-9]-*
4     0 12434 12433  20   0   3520  1408 -      S+   pts/0      0:00
          \_ /bin/sh debian/tests/upstream-system-tests -d
1[0-9][0-9]-*
0     0 12435 12434  20   0  11136  5632 -      S+   pts/0      0:00
              \_ bash ./run -d 1[0-9][0-9]-*
0     0 12506 12435  20   0  11456  6080 -      S+   pts/0      0:00
                  \_ bash ./101-rtc
0     0 12544 12506  20   0   6272  2624 -      S+   pts/0      0:00
                      \_ ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chron
1     0 12545 12544  20   0      0     0 -      Zs   ?          0:00
                          \_ [chronyd] <defunct>
1     0 12546     1  20   0   6656  4032 -      S    ?          0:00
../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chrony

So one process goes defunct and the parent doesn't collect the exit
status - can it be so simple?

The full commandline of the parent is:
../../chronyd -x -l

Could that be due to the use of “-x”? IFAIR you have extra stuff on Ubuntu about that option.

/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -u root -s

The two remaining hanging processes wait on read/select
ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12544
strace: Process 12544 attached
   0.000000 read(3, ^Cstrace: Process 12544 detached
<detached ...>
ubuntu@bos01-ppc64-chrony-adt-issue:~$ sudo strace -rT -p 12546
strace: Process 12546 attached
   0.000000 _newselect(9, [5 6 7 8], NULL, NULL, NULL^Cstrace:
Process 12546 detached
<detached ...>

Note: due to installing the package as part of the test dependencies
there is also a chrony running via systemd service - could that be an
issue?
I have stopped the process and re-started the tests - still hanging.

Killing the hanging and defunct chrony processes lets the test
continue as assumed.

The config generated by the test is this:
sudo cat /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf
pidfile /var/run/chrony/chronyd.pid
bindcmdaddress /var/run/chrony/chronyd.sock
port 14196
cmdport 17439
allow
cmdallow
local
server 127.0.0.1 port 14196 minpoll -6 maxpoll -6
bindacqaddress 127.0.0.1
bindaddress 127.0.0.1
bindcmdaddress 127.0.0.1
dumpdir /var/run/chrony
logdir /var/log/chrony
log tempcomp rawmeasurements refclocks statistics tracking rtc
logbanner 0
smoothtime 100.0 0.001
include /dev/null
keyfile /home/ubuntu/chrony-3.5/test/system/tmp/keys
driftfile /var/lib/chrony/driftfile
tempcomp /home/ubuntu/chrony-3.5/test/system/tmp/tempcomp 0.1 0 0 0 0

I can "reenter" the broken test state by just running test 101:
~/chrony-3.5/test/system$ sudo ./run -d 101-rtc
101-rtc   Testing real-time clock:
non-default settings:
  extra_chronyd_directives=rtcfile
/home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
  extra_chronyd_options=-s
  minimal_config=1
starting chronyd
<- here it hangs and I again have a defunct process in ps

It seems I can simplify this even down to:
~/chrony-3.5/test/system$ sudo ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.
conf -u root -s

This gets me
$ ps axlf | grep chrony
4     0 13136  2231  20   0  16576 10048 -      S+   pts/0      0:00
      \_ sudo ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/
system/tmp/chronyd.conf -u root -s
4     0 13137 13136  20   0   6272  2624 -      S+   pts/0      0:00
          \_ ../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/s
ystem/tmp/chronyd.conf -u root -s
1     0 13138 13137  20   0      0     0 -      Zs   ?          0:00
              \_ [chronyd] <defunct>
0  1000 13141 12570  20   0  10560  1984 pipe_w S+   pts/1      0:00
          \_ grep --color=auto chrony
1     0 13139     1  20   0   6656  3968 -      S    ?          0:00
../../chronyd -x -l
/home/ubuntu/chrony-3.5/test/system/tmp/chronyd.log -f
/home/ubuntu/chrony-3.5/test/system/tmp/chrony
d.conf -u root -s

Further simplifying the test I found a minimal config and commandline:
Config file:
rtcfile /home/ubuntu/chrony-3.5/test/system/tmp/rtcfile
RTCFile:
1 1575623140 0.0 0.0
Command:
$ ../../chronyd  -f /home/ubuntu/chrony-3.5/test/system/tmp/chronyd.conf -s

With that I got an strace for you, but it is too noisy for me to read.
Strace log:
=> https://paste.ubuntu.com/p/h3yZwxmVPr/
In line 86 i see process 13324 go into 0.000008 read(3,  <unfinished ...>
And on line 1233 the other process 13326 go into 0.000163
_newselect(9, [5 6 7 8], NULL, NULL, NULL <unfinished ...>
After that both seem to wait on each other.
Maybe one on this list knowing better what to expect and what might be
odd can see something in it?

I’ll try to have a closer look later.

I was wondering if any of those test hangs are known issues already?
In particular what about it might be architecture dependent?

As I outlined above the issue is reproducible and my system still
around - so ask whatever you need to further advise what it could be
about.

[1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/c/chrony/20191203_150720_9e9af@/log.gz
[2]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/c/chrony/20191203_142431_bdb78@/log.gz
[3]: https://salsa.debian.org/debian/chrony/blob/master/debian/tests/upstream-system-tests
[4]: https://salsa.debian.org/debian/chrony/merge_requests/2

--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd

Attachment: signature.asc
Description: PGP signature



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/