|
Bugzilla – Full Text Bug Listing |
| Summary: | infinite systemd wait for ntp-wait.service prevents boot except to single/rescue | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Felix Miata <mrmazda> |
| Component: | Basesystem | Assignee: | E-mail List <bnc-team-screening> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | astieger, max |
| Version: | Leap 42.3 | Flags: | astieger:
needinfo?
(mrmazda) |
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | redacted tails of 'journalctl | grep ntp' and of /var/log/ntp | ||
|
Description
Felix Miata
2017-10-12 22:17:08 UTC
Since you claim a regression, please roll back maintenance updates of your one month set to identify the single update that causes this. Method: https://lizards.opensuse.org/2016/07/11/zypper-patch-uninstall-rollback/ List: https://lists.opensuse.org/opensuse-updates/ zypper takes the option -R / --root dir to operate from a networked rescue system on the installed root. 1-I don't understand how as instructed by "Replace openSUSE-2016-XXX with the update in question." on the lizards URL I'm supposed to make a date selection for command input. The only ntp strings I see among the archive subjects are for yast2-ntp-client. 2-The significance of "zypper takes the option -R / --root dir to operate from a networked rescue system on the installed root." totally escapes me. The system is running, being used to type this, just without ntp-wait.service enabled. 3-/ filesystem is EXT4 on RAID1 and is the primary PC on my LAN. 4-Multiple 'zypper -v in' operations have been performed for selected updating since last running 'zypper -v up' on 7 Sept. The ntp package itself has not been changed since April, 25th. (In reply to Felix Miata from comment #0) > Oct 12 17:12:04 ntpd[1674]: restrict: ignoring line 57, mask '::' unusable. AFAICS line 57 in an unaltered ntp.conf is empty, but line 60 reads restrict ::1 which might be triggering this warning. Given that ntpd normally doesn't complain about this line, I wonder if either that line was either altered or you disabled IPv6 on that host. BTW, the YaST2 ntp module used to mangle /etc/ntp.conf, but I think this has recently been fixed. (In reply to Felix Miata from comment #2) > 1-I don't understand how as instructed by "Replace openSUSE-2016-XXX with > the update in question." on the lizards URL I was asking you to identify the single maintenance update that reproduces this, out of the month of updates. This was based on how you described the occurrence of the error. > 2-The significance of "zypper takes the option -R / --root dir to operate > from a networked rescue system on the installed root." totally escapes me. > The system is running, being used to type this, just without > ntp-wait.service enabled. It was obviously a note how to roll back updates in case the system no longer boots, since you explicitly mentioned a rescue system. Again, this is to identify the update responsible for this. (In reply to Reinhard Max from comment #3) > The ntp package itself has not been changed since April, 25th. The /etc/ntp.conf file in place during current zypper up had its timestamp altered 7 Sept local time 09:47 (after the 'zypper up' initiated at 2017-09-06 04:00:55), which change did not alter its byte count: -restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap +restrict 192.168.0.0 nomodify notrap mask 255.255.255.0 Previously, its content had last been changed 27 Dec. 2015. I have no recollection of having personally touched that file either at that time or in any recent weeks or months. Neither contains any line that includes the string "::". I restored the Sept 7 version, re-enabled ntp-wait.service, and rebooted. This journal line Oct 13 05:24:16 ntpd[1703]: line 97 column 30 syntax error, unexpected T_Mask, expecting T_EOC points to that ntp.conf line. > (In reply to Felix Miata from comment #0) > > > Oct 12 17:12:04 ntpd[1674]: restrict: ignoring line 57, mask '::' unusable. > > AFAICS line 57 in an unaltered ntp.conf is empty, but line 60 reads > > restrict ::1 I examined the content of http://download.opensuse.org/distribution/leap/42.3/repo/oss/suse/x86_64/ntp-4.2.8p10-30.1.x86_64.rpm and that line is in its etc/ntp.conf. Since restoring the Sept 7 ntp.conf file, the string "::" appears nowhere in the file. > which might be triggering this warning. Given that ntpd normally doesn't > complain about this line, I wonder if either that line was either altered or > you disabled IPv6 on that host. IPv6 is disabled on the whole LAN using multiple methods, including kernel cmdline option ipv6.disable=1 in every grub stanza. On the reboot done following my last ntp.conf restore, I removed that option from the kernel cmdline and was able to boot normally. I rebooted almost instantly to try again without removing it, and boot again proceeded normally (my current boot). This is 'grep ntp' from current journal: Oct 13 05:22:31 kernel: Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes) Oct 13 05:22:53 ntpd[1780]: ntpd 4.2.8p10@1.3728-o Fri Jul 7 17:41:17 UTC 2017 (1): Starting Oct 13 05:22:53 ntpd[1780]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf Oct 13 05:22:53 ntpd[1781]: proto: precision = 0.046 usec (-24) Oct 13 05:22:53 ntpd[1781]: line 97 column 30 syntax error, unexpected T_Mask, expecting T_EOC Oct 13 05:22:53 ntpd[1781]: syntax error in /etc/ntp.conf line 97, column 30 Oct 13 05:22:53 ntpd[1781]: restrict 0.0.0.0: KOD does nothing without LIMITED. Oct 13 05:22:53 ntpd[1781]: restrict ::: KOD does nothing without LIMITED. Oct 13 05:22:53 ntpd[1781]: switching logging to file /var/log/ntp Oct 13 05:22:53 start-ntpd[1753]: Starting network time protocol daemon (NTPD) Oct 13 05:22:53 systemd[1]: Starting Wait for ntpd to synchronize system clock... Oct 13 05:23:08 systemd[1]: Started Wait for ntpd to synchronize system clock. Oct 13 05:23:30 systemd[1]: Stopped Wait for ntpd to synchronize system clock. Oct 13 05:23:56 kernel: Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes) Oct 13 05:24:16 ntpd[1698]: ntpd 4.2.8p10@1.3728-o Fri Jul 7 17:41:17 UTC 2017 (1): Starting Oct 13 05:24:16 ntpd[1698]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf Oct 13 05:24:16 ntpd[1703]: proto: precision = 0.046 usec (-24) Oct 13 05:24:16 ntpd[1703]: line 97 column 30 syntax error, unexpected T_Mask, expecting T_EOC Oct 13 05:24:16 ntpd[1703]: syntax error in /etc/ntp.conf line 97, column 30 Oct 13 05:24:16 ntpd[1703]: restrict 0.0.0.0: KOD does nothing without LIMITED. Oct 13 05:24:16 ntpd[1703]: restrict ::: KOD does nothing without LIMITED. Oct 13 05:24:16 ntpd[1703]: restrict: ignoring line 90, mask '::' unusable. Oct 13 05:24:16 ntpd[1703]: switching logging to file /var/log/ntp Oct 13 05:24:16 start-ntpd[1676]: Starting network time protocol daemon (NTPD) Oct 13 05:24:16 systemd[1]: Starting Wait for ntpd to synchronize system clock... Oct 13 05:24:33 systemd[1]: Started Wait for ntpd to synchronize system clock. Given that I booted at least 9 times after 'zypper -v up' (based on /var/log/ntpd entries), not counting several boots to emergency mode, I can't imagine what was going on, or why mere reporting this bug instead of asking on a mailing list seems to be what made the problem disappear. (In reply to Andreas Stieger from comment #5) > (In reply to Felix Miata from comment #2) > > 1-I don't understand how as instructed by "Replace openSUSE-2016-XXX with > > the update in question." on the lizards URL > I was asking you to identify the single maintenance update that reproduces > this, out of the month of updates. This was based on how you described the > occurrence of the error. I'm pretty sure I understand what you were asking for, just not how to do it. > > 2-The significance of "zypper takes the option -R / --root dir to operate > > from a networked rescue system on the installed root." totally escapes me. > > The system is running, being used to type this, just without > > ntp-wait.service enabled. > It was obviously a note how to roll back updates in case the system no > longer boots, since you explicitly mentioned a rescue system. The word "rescue" in the bug summary is a reference to a Grub stanza that produces in systemd runlevel1.target/rescue.target that is equivalent to sysvinit runlevel 1 (single user mode, no networking) so as not to have ntpd try to start and then halt init via the infinite systemd wait. > Again, this is to identify the update responsible for this. I don't think we'll ever know, unless this reappears after a new round of updates. I do remain bothered by the syntax error and restrict journal messages, and particularly the inability of ntp-wait.service to time out, so for now I'm leaving this open instead of closing INVALID. There is a timeout in ntp-wait.service, but it is 30000 seconds (8 hours and 20 minutes). After all, the purpose of this service is to wait until ntpd has synchronized. ;) I am closing this as INVALID, because the warnings errors you are seeing come from lines in the config file that are either from an old default (KOD) or were added locally (line 97). Feel free to reopen if it happens again with a config file that is based on the latest default. If you reopen, please attach the whole ntp.conf that was active when the respective logs were produced and/or the extended hang was observed. My guess is that the hang you were seeing was unrelated to any package updates and just caused by upstream ntp servers being unreachable or unresponsive. FWIW, these are the server lines in ntp.conf file used for current boot, the same as when this trouble started: server 0.us.pool.ntp.org iburst server 1.us.pool.ntp.org iburst server 2.us.pool.ntp.org iburst server 3.us.pool.ntp.org iburst The ntp.conf file from the ntp 4.2.8p10-30.1 rpm contains no server lines. One of the ntp.conf files I tried before filing this bug contained these server lines: server 0.us.pool.ntp.org iburst server 1.us.pool.ntp.org iburst server 2.us.pool.ntp.org iburst server 3.us.pool.ntp.org iburst server 0.opensuse.pool.ntp.org iburst server 1.opensuse.pool.ntp.org iburst server 2.opensuse.pool.ntp.org iburst server 3.opensuse.pool.ntp.org iburst All other ntp.conf variants tried contain these: server 0.opensuse.pool.ntp.org iburst server 1.opensuse.pool.ntp.org iburst server 2.opensuse.pool.ntp.org iburst server 3.opensuse.pool.ntp.org iburst I've changed for next boot to the following: server -4 0.us.pool.ntp.org iburst server -4 1.us.pool.ntp.org iburst server -4 2.us.pool.ntp.org iburst server -4 0.opensuse.pool.ntp.org iburst server 3.us.pool.ntp.org iburst When I acquire a round tuit I'll test the last above on another or more 42.3 and/or TW installation(s), some of which are currently directed to this host rather than any of those public servers. |