I am not too much concerned about this specific test environment. The config files are written by a centralised configuration software and I cannot simply change them manually, but I can fix the specific problem by adding a second server. Actually I will fake one by adding an alias IP to the one server and query it on both. I've often used that trick to simulate two servers for test purposes.
A serious operational environment however could look like this:
- 2 or 3 servers
- 4 peers
- Any number of leave nodes
Now I wonder whether I could run into a similar problem when the uplink is broken for a while, the peers have drifted away and then the uplink comes back.
Any of the peers then sees 3 peers with the same time and 2 or 3 servers with a different one. I guess initstepslew would fail again, or even, in case of 2 servers, prefer the peers, won't it?
Regards,
Frank
-----Original Message-----
From: Miroslav Lichvar [mailto:mlichvar@xxxxxxxxxx]
Sent: Freitag, 1. Februar 2019 16:25
To: chrony-users@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [chrony-users] initstepslew seems to break chronyd
On Fri, Feb 01, 2019 at 02:58:09PM +0000, MUZZULINI Frank wrote:
Hello Miroslav,
Okay, I guess I have to explain a little more:
The machine and its peer are Test-VMs. Something in the VM-Environment (possibly a backup) causes timing problems almost every night.
As you can see in the measurement log, around 1:19 on offset of almost a second appeared out of the blue. After that the two peers are free running.
I will check whether adding orphan to the local line allows the machines to recover from the situation automatically.
Ok. I think that explains the behavior quite well.
You can try the orphan mode, but I'm not sure if peering between two machines with unreliable clocks is a good idea. They should each have at least three sources, ideally at least two realiable ones.
You could also use the trust option to select the server when the two sources don't agree.
Around 15:00 I started examining the problem and restarted chronyd several times. The restart at 15:18 was without initstepslew. One of the attempts before was with orphan added to local.
The part that really surprised me was that initstepslew not only failed to select the server the better source, but also affected the daemon's long term behaviour.
It cannot select between two sources that don't agree with each other.
A third source would be needed to detect which one is wrong. If you don't use initstepslew, there is no problem with selection on start because the peer will be selectable few minutes later after the server, so the local clock can be set to the server's time and the other peer will not see the host as a falseticker.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.