Re: [chrony-users] Resume from suspend and default makestep configuration

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Tuesday 19 May 2020 11:10:01 FUSTE Emmanuel wrote:
> Le 19/05/2020 à 12:29, Pali Rohár a écrit :
> > On Monday 18 May 2020 13:45:04 FUSTE Emmanuel wrote:
> >> Le 18/05/2020 à 13:15, Pali Rohár a écrit :
> >>> On Monday 18 May 2020 10:45:02 FUSTE Emmanuel wrote:
> >>>> Hello Pali,
> >>>>
> >>>> Le 18/05/2020 à 12:37, Pali Rohár a écrit :
> >>>>> The main problem is when system is put into suspend or hibernate state.
> >>>>>
> >>>>> In my opinion resuming from suspend / hibernate state should be handled
> >>>>> in the same way as (re)starting chronyd. You do not know what may
> >>>>> happened during sleep.
> >>>> Yes and in case of needed workaround, it should be done at the system
> >>>> level, not chrony.
> >>>> A job for systemd.
> >>> Hello! Sorry for a stupid question, but what has systemd in common with
> >>> chronyd? Why should systemd care about chronyd time synchronization?
> >> Nothing.
> >> But it is to your "process manager" being systemd, sysvinit pile of
> >> scripts or whatever to restart or notify chrony, it has do do
> >> housekeeping anyway for other things when you suspend/resume.
> > Hm... I remember that in past it was needed to blacklist broken daemons,
> > software and kernel modules which did not work correctly during S3 or
> > hibernate state. It was in some pm scripts utils...
> >
> > But I thought that these days are already passed and software can deal
> > with fact that machine may be put into suspend or hibernate state.
> >
> > So what you are suggesting is to put chronyd daemon into list of broken
> > software (which needs to be stopped prior suspend / resume)?
> >
> > It does not make sense for me as the immediate step after putting
> > software or kernel module into such "blacklist" was to inform upstream
> > authors of that daemon or kernel module they it is broken / incompatible
> > with suspend state and it should be fixed.
> >
> > That "blacklist" was just workaround for buggy software and not
> > permanent solution.
> No not chrony, but the machine which change RTC on your back : buggy Bios

Sorry, but I have not caught this line. Blacklist contained list of
buggy software, daemons and kernel modules which had to be (in past)
stopped / unloaded prior system went to S3 and started / (re)loaded
after system resumed. So obviously putting "buggy Bios" into blacklist
not only does not make sense, but also it did nothing. In that
particular case chronyd had to be put into that blacklist of buggy
software as it as you described is chronyd which needs to be stopped /
started... But as I said this was used in past when buggy software and
kernel modules were there when they was not able to correctly handle S3
state.

> >
> >> Exactly as networkmanager, ifupdown scripts, systemd-networkd
> >> reload/restart some network services when interfaces/tunnels/vpn are
> >> upped/downed.
> > This is something totally different. all those mentioned "services" are
> > just independent part of system which manages network connections.
> >
> > chronyd is there to manage time synchronization.
> It was an "imaged comparison" for event driven config change.
> The event in the suspend vs time case,  the event is only know and 
> should be managed by your init system not by your time daemon.
> 
> >
> >>>>> And as I pointed there are existing problems that UEFI/BIOS firmware
> >>>>> changes RTC clock without good reason which results in completely wrong
> >>>>> system clock.
> >>>>>
> >>>> Could well be identified by blacklist at the udev/systemd level for
> >>>> applying or not the workaround (restart chrony or launch a chronyc
> >>>> command at resume)
> >>> Could you describe in details what do you mean by blacklist? Which udev
> >>> blacklist you mean and what should be put into that blacklist? I have
> >>> not caught this part.
> >> Faulty systems could be identified by DMI/ACPI strings and quirk applied.
> > And what is the faulty system?
> Citing yourself :
> 
> "as I pointed there are existing problems that UEFI/BIOS firmware
> changes RTC clock without good reason"

Ok. Main problem is that there is no way how to identify such broken
firmwares. So definition is now nice and clear but basically useless as
it does not say anything how to find or identify such faulty systems.

> 
> >
> > I think this is something general and not related to particular machine.
> > I guess under specific conditions it may happen on any system.
> >
> >> See for example /lib/udev/hwdb.d/60-sensor.hwdb  for some laptop sensors.
> >> We could add an attribute to the RTC if it matche some vendor/bios
> >> version/model etc... to put in the hwdb (the blacklist)
> >> A udev rule will assign this attribute to the RTC if you are running on
> >> a known buggy system.
> >> A script could do anything you want at suspend/resume time in
> >> /lib/systemd/system-sleep if your RTC has the offended attribute (see
> >> systemd-sleep man page).
> >> Or better, a unit run at resume time could do anything too.
> >> The hwdb abstraction is not need if it is a local hack and should be
> >> properly defined with the hwdb/udev/systemd developers.
> > This database is for describing hardware differences or issues.
> >
> > But above problem with time synchronization is general and hardware
> > independent. You can simulate same issue on your machine.
> >
> > Just put your computer into hibernation. Then boot from liveUSB some
> > Linxu distribution and change RTC time. Turn off liveUSB and boot your
> > hibernated system. And you should be in same situation as I described.
> Yes but this is like shooting yourself in your feet.

This is just test case, so you can check, "simulate" and reproduce
this issue even without "faulty machine".

Moreover, Windows systems used to store RTC in local time and Linux
systems in UTC. I do not know if this still applies but basically multi
OS machines are affected by the same issue.

> If you want to be robust in this case and all others, then by default 
> you must restart ANY time sync daemon in the resume callback of your 
> init system, being ntpd or chrony, systemd or sysvinit or upstart or 
> anything else. But it is problematic as Miroslav point out as you 
> potentially start to trust any anonymous time source more than your own RTC.

What is problematic here? Your RTC may be also shifted as I pointed, so
it has same trust source as any other anonymous source.

Also what is difference between trusting those "anonymous time source"
at chronyd startup time and at time when resuming your machine from
suspend / hibernate?

For me it does not make sense to say that "anonymous time source" is
fully trusted when starting chronyd at computer startup time. But same
"anonymous time source" is untrusted at computer resume from hibernate
time.

> The actual makestep value is a sane default for all the majority of sane 
> machine with standard usecase.

So multi-OS scenario is not standard (anymore)?

> For broken machine or coner usecase I think that the good level in the 
> stack for applying a workaround is at the init level, restarting the 
> time daemon on resume and not messing the makestep value. Because if you 
> do that you will not only trust any anonymous and potentially bad time 
> source more than your own RTC at boot /resume time, but at all time.
> That's all I could say.
> 
> Emmanuel.
> 
> 
> 
> 
> 

-- 
Pali Rohár
pali.rohar@xxxxxxxxx

-- 
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/