Re: [chrony-dev] Bug -- interaction between ntpdate and chronyd at bootup -- will never sync up

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


 You could just start chrony earlier and as you say, tell it to do a makestep
if the time is way off. I can certainly see why chrony gets totally confused
if it thinks that the clock rate is out by over 900000 PPM. Perhaps it should
just exit at that point. It is not clear what a reasonable recovery strategy
is there. In your case, it should probably just ignore that absurd frequecy
shift, but that is hardly a viable general strategy.


On Mon, 4 Feb 2013, ray vantassle wrote:

So why are you using ntpdate?

Installed automatically by the distro (debian).

This machine is a Pogoplug, which does not have a RTC.  Neither does
my router, which runs Linux TomatoUSB.  Neither does a Raspberry Pi.
On these low power computers, ntpd has a heavy footprint, so chrony is
a much better solution, IMHO.

Syslog clip:

Dec 31 18:00:37 PogoPlug chronyd[1894]: chronyd version 1.27 starting
Dec 31 18:00:37 PogoPlug chronyd[1894]: Linux kernel major=3 minor=2
patch=35
Dec 31 18:00:37 PogoPlug chronyd[1894]: hz=100 shift_hz=7
freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833
max_tick_bias=1000 shift_pll=2
Dec 31 18:00:37 PogoPlug chronyd[1894]: Frequency 99963.989 +/-
233363.911 ppm read from /var/lib/chrony/chrony.drift


This reading from chrony.drift is absurd. Where did it come from?

Um, no RTC.  Also, ntpdate and chronyd are stepping on one another at
startup.  I think that the timing is such that chronyd is in the midst
of conditioning the clock when ntpdate sets the time, so chronyd
thinks the clock is fubar.
In the case here, chronyd NEVER gets straightened out.  I had to stop
and restart it, and then it got sane.

Since the clock starts out at Dec 31 1969 18:00:00 CST, it is
important to set the date/time to approximately the actual date/time
as soon as possible in the bootup sequence.  Ntpdate with 1 or 2
servers and only 1 sample does that.  Chronyd takes quite a bit
longer.

Rdate would probably be better than ntpdate.  The thing is to
*quickly* get a time at bootup, and then let chrony make it accurate
later on.

If you don't start out with a semi-valid time, this is what happens: "
chronyd[1932]: System clock wrong by 1359997028.288104 seconds,
adjustment started"
That's gonna take a loooooonh time to slew off. -)

On further perusal, adding "makestep 1000 10" to the config is neccessary.





Dec 31 18:00:37 PogoPlug chronyd[1894]: NTP packet received from
unauthorised host 174.36.71.205 port 123


I guess ntpdate's return packet comes back and is intercepted by chrony.
So why are you using ntpdate? And you might tell ntpdate ( which is a dead
end
program and is being phased out by the ntpd people) to use a different port
than the ntp  port.
The phantom packet doesn't matter, and doesn't hurt anything.  This
messgae just comfirms that ntpdate & chrony are both trying to set the
clock at the same time.

Yes, I know that ntpdate is "officially" deprecated.  And that Mills hates it.
This has been the case for 12+ years, so I don't think the real world agrees.


And why do you not tell chronyd to start up later> Or not run ntpdate at
all.

No RTC.
Since the clock starts out at Dec 31 1969 18:00:00 CST, it is
important to set the date/time to approximately the actual date/time
as soon as possible


Dec 31 18:00:40 PogoPlug chronyd[1894]: Source 72.8.140.222 online
Dec 31 18:00:40 PogoPlug chronyd[1894]: Source 169.229.70.95 online
Dec 31 18:00:40 PogoPlug chronyd[1894]: Source 38.229.71.1 online
Dec 31 18:00:40 PogoPlug chronyd[1894]: Source 96.44.142.5 online
Feb  3 21:52:34 PogoPlug ntpdate[1427]: step time server 96.126.126.96
offset 1359949910.330424 sec


So chrony steps the clock.
No.  Ntpdate does.




Feb  3 21:52:35 PogoPlug chronyd[1894]: Selected source 38.229.71.1
Feb  3 21:52:35 PogoPlug chronyd[1894]: System clock wrong by 0.699306
seconds, adjustment started
Feb  3 21:52:35 PogoPlug chronyd[1894]: Required tick -1999 outside
allowed range (9000 .. 11000)


It is finding that the rate of your clock is way way way too slow.
Yes.  It got totally confused and decided to go insane.





Feb  3 21:52:37 PogoPlug chronyd[1894]: Can't synchronise: no majority

Some queries:
PogoPlug>chronyc tracking
Reference ID    : 127.127.1.1 ()


????
So you have a local time set?

Not really.  Like I said, chronyd is throughly confused.




Stratum         : 10
Ref time (UTC)  : Mon Feb  4 04:13:07 2013
System time     : 0.000000000 seconds fast of NTP time
Last offset     : -0.699305534 seconds
RMS offset      : 0.699305534 seconds
Frequency       : 100027.977 ppm fast


Yes, absurd.


Residual freq   : 0.000 ppm
Skew            : 0.000 ppm
Root delay      : 0.000000 seconds
Root dispersion : 0.000001 seconds
Update interval : 0.0 seconds
Leap status     : Not synchronised


PogoPlug>chronyc sourcestats
210 Number of sources = 4
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std
Dev

==============================================================================
ntp2.ResComp.Berkeley.EDU   5   4   43y    1000000      0.023   +1247s
2210ms
96.44.142.5                 3   2   43y    1000000      0.381   -295ms
702ms
clock.team-cymru.org        4   4   43y    1000000      0.061   +1247s
2324ms
irc.indoforum.org           9   5   17m    -111174     11.769  -138.8s
2563us


PogoPlug>chronyc sources
210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample

===============================================================================
^x ntp2.ResComp.Berkeley.EDU     3   8   370   20m   -346ms[ -346ms] +/-
97ms
^x 96.44.142.5                   2  10     0   20m   -295ms[ -295ms] +/-
62ms
^x clock.team-cymru.org          2   9   200   20m   -273ms[ -273ms] +/-
19ms
^x irc.indoforum.org             2   8   377   212  -115.8s[-115.8s] +/-
90ms


And you are using irc.indoforum.org why?

It got picked up as a pool member from this:
server 0.debian.pool.ntp.org offline
server 1.debian.pool.ntp.org offline
server 2.debian.pool.ntp.org offline
server 3.debian.pool.ntp.org offline

God only knows why it thought the other servers were 43 years off, but
that one was only 17 months off.  Probably an integer overflow
somewhere.


I'm not really complaining about the way chrony works.  Rather, I'm
reporting a scenario where chrony gets crammed in sideways and can't
get itself straightened back out.  I'm not even sure what I'd like it
to do here, other than to not get confused when ntpdate sets the time
behind its back.

BTW, the time in Linux routers is generally set by doing a naive ntp
query and setting the time one an hour to once a day.  Chrony is a
resonable alternative.  Ntp is way too large to be reasonable.

FWIW, I've resolved my immediate problem by having the init.d scripts
for ntpdate and chrony to cross-check for one another.  Not entirely
happy woth that, and it's certainly not a general solution.



--
William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  www.theory.physics.ubc.ca/

--
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/