Re: [chrony-users] Sporadic NTP dropouts (bad maximum delay ratio and maximum delay dev ratio) |
[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]
Am 29.05.2015 um 20:00 schrieb Bill Unruh:
In our setup, we have 8 computers in a local network without global clock reference. One of the computers is acting as NTP server. This server gets its time once at startup from a GPS device (here it might jump once). Bootup order of the computers is not ensured in any way, clients might be up before server. I tried to use initstepslew and makestep directive, to force a quick adoption of the server time after bootup.William G. Unruh | Canadian Institute for| Tel: +1(604)822-3273 Physics&Astronomy | Advanced Research | Fax: +1(604)822-5324 UBC, Vancouver,BC | Program in Cosmology | unruh@xxxxxxxxxxxxxx Canada V6T 1Z1 | and Gravity | www.theory.physics.ubc.ca/ On Fri, 29 May 2015, Ulrich Schwesinger wrote:Thanks for the helpful feedback so far. I contacted the server admin, and he says there should be nothing that makes the serverI do not understand what the chrony.conf has in it. The server should be queried something like once ever minute to once an hour, not once every two seconds. What is your minpoll and maxpoll and why have you setjump... I am wondering about a couple of things:* Is there any specific test among these all, or a combination of those which will make the status go to unsynched? * Why is there for example an 8 seconds gap in the log entries? Usually they are < 1 second. It also looks to me that score and root dispersion (sorry, not sure what that exactly is) are kind of "reset" when that happens.them so low?
chrony.conf looks like this, minpoll is 5, so no clue why it seems to query every two seconds.
server 192.168.0.30 minpoll 5 maxpoll 7 iburst keyfile /etc/chrony/chrony.keys commandkey 1 driftfile /rw/.chrony/chrony.drift log tracking measurements statistics logdir /rw/logs/chrony/ maxupdateskew 100.0 initstepslew 10 192.168.0.30 makestep 100 10 dumponexit dumpdir /rw/.chrony allow 10/8 allow 192.168/16 allow 172.16/12 logchange 0.5 rtconutc
* Why does the stratum test fail? Can you explain what happens?The server is suddenly reporting its stratum to be higher than yours. Theproblems are in the server, not in your system. Your sysadmin of the server isfobbing you off without looking at his system.
Thanks for clarifying
Am 28.05.2015 um 22:14 schrieb Bill Unruh: On Thu, 28 May 2015, Ulrich Schwesinger wrote: Tests 1234 abc 5678 are defined in the ntp specifications. From RFC 1305TEst 1 and 2 test to make sure that the timestamps make sense (eg are not the wsame as an old packet and pairs with the last one sent to that peer) test3 is that the originate and receive timestamps are non-zero, Test 4 is taht the delay (round trip time) be reasonable, abc are subsets of that, 5 is authentication, test 6 irequires peer clock by synchronized and that the itnerval since the peer clock was last updated is positive and less than NTP.MAXAGE, test 7 that the host has no lower stratum than the server, and 8 that the header contains reasonable values for rootdelay and rootdispersion.Ie, if any of the tests are out the server's time is too suspect to use. Yourserver is problematic.I marked some lines that look suspicious to me. In in the first line, some test seems to fail [110]. Inthe 2nd line, suddenly the offset jumps up to -1.9 seconds. For the 3rd line, from the documentation I found this:Leap status: ? means the remote computer is not currently synchronised. 5678: Tests for maximum delay, maximum delay ratio and maximum delay dev ratio, against definedparameters, and a test for synchronisation loop (1=pass, 0=fail) [1111]So that would mean that delay ratio and delay dev ratio is bad...not sure what that really meansthough.It means that the round trip time is out of spec. And it looks like your source, whatever that is, went nuts and jumped by 2 sec suddenly. that is why you should be using at least 3 sources. If one goes mad, the others can outvote it. If you only use 1, then its time is by definition right, even ifit is out by 40 years.I find this very hard to debug, is there any other thing I could do to find out what's exactly goingwrong?Talk to the person who is sysadmin on the server and find out what happened. Use more than one server.Thanks for your answers in advance, Ulrich
--To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |