Re: [chrony-users] Resume from suspend and default makestep configuration

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Le 19/05/2020 à 13:30, Pali Rohár a écrit :
> On Tuesday 19 May 2020 11:10:01 FUSTE Emmanuel wrote:
>> Le 19/05/2020 à 12:29, Pali Rohár a écrit :
>>> On Monday 18 May 2020 13:45:04 FUSTE Emmanuel wrote:
>>>> Le 18/05/2020 à 13:15, Pali Rohár a écrit :
>>>>> On Monday 18 May 2020 10:45:02 FUSTE Emmanuel wrote:
>>>>>> Hello Pali,
>>>>>>
>>>>>> Le 18/05/2020 à 12:37, Pali Rohár a écrit :
>>>>>>> The main problem is when system is put into suspend or hibernate state.
>>>>>>>
>>>>>>> In my opinion resuming from suspend / hibernate state should be handled
>>>>>>> in the same way as (re)starting chronyd. You do not know what may
>>>>>>> happened during sleep.
>>>>>> Yes and in case of needed workaround, it should be done at the system
>>>>>> level, not chrony.
>>>>>> A job for systemd.
>>>>> Hello! Sorry for a stupid question, but what has systemd in common with
>>>>> chronyd? Why should systemd care about chronyd time synchronization?
>>>> Nothing.
>>>> But it is to your "process manager" being systemd, sysvinit pile of
>>>> scripts or whatever to restart or notify chrony, it has do do
>>>> housekeeping anyway for other things when you suspend/resume.
>>> Hm... I remember that in past it was needed to blacklist broken daemons,
>>> software and kernel modules which did not work correctly during S3 or
>>> hibernate state. It was in some pm scripts utils...
>>>
>>> But I thought that these days are already passed and software can deal
>>> with fact that machine may be put into suspend or hibernate state.
>>>
>>> So what you are suggesting is to put chronyd daemon into list of broken
>>> software (which needs to be stopped prior suspend / resume)?
>>>
>>> It does not make sense for me as the immediate step after putting
>>> software or kernel module into such "blacklist" was to inform upstream
>>> authors of that daemon or kernel module they it is broken / incompatible
>>> with suspend state and it should be fixed.
>>>
>>> That "blacklist" was just workaround for buggy software and not
>>> permanent solution.
>> No not chrony, but the machine which change RTC on your back : buggy Bios
> Sorry, but I have not caught this line. Blacklist contained list of
> buggy software, daemons and kernel modules which had to be (in past)
> stopped / unloaded prior system went to S3 and started / (re)loaded
> after system resumed. So obviously putting "buggy Bios" into blacklist
> not only does not make sense, but also it did nothing. In that
> particular case chronyd had to be put into that blacklist of buggy
> software as it as you described is chronyd which needs to be stopped /
> started... But as I said this was used in past when buggy software and
> kernel modules were there when they was not able to correctly handle S3
> state.
I said the machine not chrony.
Please I'm not native English, but this conversation became more and 
more like a trooling one.
Blacklist are black list, this is a generic term as you point out.

>
>>>> Exactly as networkmanager, ifupdown scripts, systemd-networkd
>>>> reload/restart some network services when interfaces/tunnels/vpn are
>>>> upped/downed.
>>> This is something totally different. all those mentioned "services" are
>>> just independent part of system which manages network connections.
>>>
>>> chronyd is there to manage time synchronization.
>> It was an "imaged comparison" for event driven config change.
>> The event in the suspend vs time case,  the event is only know and
>> should be managed by your init system not by your time daemon.
>>
>>>>>>> And as I pointed there are existing problems that UEFI/BIOS firmware
>>>>>>> changes RTC clock without good reason which results in completely wrong
>>>>>>> system clock.
>>>>>>>
>>>>>> Could well be identified by blacklist at the udev/systemd level for
>>>>>> applying or not the workaround (restart chrony or launch a chronyc
>>>>>> command at resume)
>>>>> Could you describe in details what do you mean by blacklist? Which udev
>>>>> blacklist you mean and what should be put into that blacklist? I have
>>>>> not caught this part.
>>>> Faulty systems could be identified by DMI/ACPI strings and quirk applied.
>>> And what is the faulty system?
>> Citing yourself :
>>
>> "as I pointed there are existing problems that UEFI/BIOS firmware
>> changes RTC clock without good reason"
> Ok. Main problem is that there is no way how to identify such broken
> firmwares. So definition is now nice and clear but basically useless as
> it does not say anything how to find or identify such faulty systems.
Yes that is the generic problem of faulty hw/devices/firmware, they are 
faulty but not on purpose.
The kernel is full of theses lists. And they are build by hand with 
users/developers feedback but you know that in the Bt too world isn't it ?

>
>>> I think this is something general and not related to particular machine.
>>> I guess under specific conditions it may happen on any system.
>>>
>>>> See for example /lib/udev/hwdb.d/60-sensor.hwdb  for some laptop sensors.
>>>> We could add an attribute to the RTC if it matche some vendor/bios
>>>> version/model etc... to put in the hwdb (the blacklist)
>>>> A udev rule will assign this attribute to the RTC if you are running on
>>>> a known buggy system.
>>>> A script could do anything you want at suspend/resume time in
>>>> /lib/systemd/system-sleep if your RTC has the offended attribute (see
>>>> systemd-sleep man page).
>>>> Or better, a unit run at resume time could do anything too.
>>>> The hwdb abstraction is not need if it is a local hack and should be
>>>> properly defined with the hwdb/udev/systemd developers.
>>> This database is for describing hardware differences or issues.
>>>
>>> But above problem with time synchronization is general and hardware
>>> independent. You can simulate same issue on your machine.
>>>
>>> Just put your computer into hibernation. Then boot from liveUSB some
>>> Linxu distribution and change RTC time. Turn off liveUSB and boot your
>>> hibernated system. And you should be in same situation as I described.
>> Yes but this is like shooting yourself in your feet.
> This is just test case, so you can check, "simulate" and reproduce
> this issue even without "faulty machine".
OK
> Moreover, Windows systems used to store RTC in local time and Linux
> systems in UTC. I do not know if this still applies but basically multi
> OS machines are affected by the same issue.
>
>> If you want to be robust in this case and all others, then by default
>> you must restart ANY time sync daemon in the resume callback of your
>> init system, being ntpd or chrony, systemd or sysvinit or upstart or
>> anything else. But it is problematic as Miroslav point out as you
>> potentially start to trust any anonymous time source more than your own RTC.
> What is problematic here? Your RTC may be also shifted as I pointed, so
> it has same trust source as any other anonymous source.
>
> Also what is difference between trusting those "anonymous time source"
> at chronyd startup time and at time when resuming your machine from
> suspend / hibernate?
>
> For me it does not make sense to say that "anonymous time source" is
> fully trusted when starting chronyd at computer startup time. But same
> "anonymous time source" is untrusted at computer resume from hibernate
> time.
Tradeoff, it is already bad. Ntpd startup scripts used to not call 
ntpdate and bail out in case of too big discrepancy (2s or 3s from 
memory). It is considered too user unfriendly without proper IHM 
interaction.
But doing that at boot time only is better than at boot time AND at 
resume time.
And  better than trusting any source at any time.
>> The actual makestep value is a sane default for all the majority of sane
>> machine with standard usecase.
> So multi-OS scenario is not standard (anymore)?
It never have been. It works, with some limitations and tradeoff.
RTC could not be R/W shared by essence. It is realtime.
You could not save it's state and restore it  later or you must have 
special HW/Firmware that "virtualize" it and is able to maintain "per 
os" state.
And the different historic direct/indirect usages of  the RTC on PC 
complicate things to a dead end.
I would never entrust this task to a pc bios ...
Time related, the only working multi-OS scenario is under an hypervisor, 
because it could arbitrate the access to the hardware and is the only 
one messing with the real(time) hardware.
In this case, you synchronise your hypervisor with external sources and 
provide para-virtualised system clock or virtual ptp clock to your 
guests for system clock sync. If you have to run ntp, your hypervisor 
must be your source with any conf your want : it is a trustable source. 
A virtual RTC is provided to your guest if it need one.
That is the only multi-OS standard/working scenario from a timekeeping 
point on view in the PC world.

Emmanuel.
N������y隊W!���������n���\��"������z)�.n7��Z+��f����|�������'��}���*+�����)�.n7��:蹹^f��X��f����'��}���*+


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/