Re: [chrony-users] Chrony forgets servers (specified by FQDN) when no DNS server |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
On 12/20/2017 11:51 AM, Rob Janssen wrote:
A time server that uses DNS based rules for reference servers should
fail gracefully when the DNS does not return
an IP address (anymore). So, when it does a lookup only once it should
issue an error message about that server,
and proceed its startup as if that server was never there in the
configuration. When it is resolving DNS names on
a regular basis (e.g. once per day), it could keep the server
configuration and keep retrying the DNS lookup at
that same interval and start using the server when the DNS lookup succeeds.
Not starting the service at all is only an option when all the DNS
lookups have failed (i.e. there is no server) and
there is no mechanism to re-try the lookups. When there is, it is much
better to keep the service running.
(after all, a network may not be available at boot time and may become
available later)
I find this statement of behavior (treat NOSERV/NXDOMAIN as an excuse to
forget a server/peer/pool) a bit astonishing, and very un-Unix-like.
Let's make some assumptions:
1. The daemon software has, in its data structures for
server/peer/pool, the FQDN for each server and peer.
2. The daemon software, on NXDOMAIN or no answer, sets the IP address
to zeros (0xFFFFFFF for IPv4, and
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 for IPv6)
3. All information about the server/peer/pool entry is in the data
structure, such as filter data
4. The polling loop is able to fork a process to perform DNS lookups.
(This many not necessarily be true with Windows.)
So the standard polling loop uses the poll timing specified in the
server/peer/pool command for all servers, peers, and pools, initialized
or not. If the poll interval has expired for a given server/peer/pool
entry, it does this:
a. IP address zero: reset pool interval to minpoll, and fork a
process to do DNS lookup -- the forked process will perform the DNS
lookup, and on success will fill in the IP address and set the
first-time flag so the polling loop will pick it up in the next cycle
b. IP address non-zero and first-time flag set: do what the server
currently does with a new server or peer entry
b. IP address non-zero and first-time flag not set: do what it does
now.
Forking a process means that the daemon's polling loop doesn't lock up
the daemon on the DNS lookup when there is no DNS available, or it takes
a double-handful of seconds to get NOSERV or NXDOMAIN. (If a process is
already forked for an entry, then don't fork it again; wait for the
forked process to die.) If/when the forked process gets a successful A
or AAAA record, it sets it in the data structure for the entry so that
the pool loop will pick it up on the next poll interval expiration.
Also note that it eliminates special start-up code. The config file
parser fills in the data structure for each server/peer with zero IP
address, and the polling loop handles the lookup and initialization.
This also works with chronyc(1): it causes chronyd(8) to build the new
data structure, and the polling loop does the rest. When you use
chronyc(1) to remove a server or peer, chronyd(8) just removes the data
structure for that entry. Poof.
And that's how I would remove chrony's current astonishing behavior in
the face of DNS not being there at start-up. Like in my power-fail
situation, where the edge router with chronyd(8) comes up before the
CSU/DSU to the network. Enterprise users might be surprised to learn
about this astonishing forgetfulness of chronyd(8) in the face of a
temporary failure.
How to handle entries where the NTP server has gone away?
Keep a TTL timer, set by an entry in the configuration file.
(reasonable default would be 24 hours.) When "reach" is not 0x00, reset
the TTL timer. When the TTL timer expires, clear the filter variables,
set the poll to minpoll, zero the IP address, and reset the TTL timer.
The rationale for this method of handling extended tempfail is the same
rationale used for SMTP daemons: wait somewhat impatiently for the
remote server to come back, and if it doesn't come back in a reasonable
time then bounce the mail.
From the standpoint of NTP protocol, a server that is out of service
for an extended time may have different properties when it comes back
on-line. (Replaced, for example.) So the filter variables would
contain bogus data, particularly in a pool situation where you were
originally talking to a "close" server, and now switched to a "far" server.
(And, it eliminates the need for a separate "pool" command, which would
help some distribution sources (<cough> Red Hat) who use "server" when
they mean "pool" in their default configurations.)
If this should be moved to chrony-dev, I can do that.
--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.