Bug 1063089

Summary: infinite systemd wait for ntp-wait.service prevents boot except to single/rescue
Product: [openSUSE] openSUSE Distribution Reporter: Felix Miata <mrmazda>
Component: BasesystemAssignee: E-mail List <bnc-team-screening>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: astieger, max
Version: Leap 42.3Flags: astieger: needinfo? (mrmazda)
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: redacted tails of 'journalctl | grep ntp' and of /var/log/ntp

Description Felix Miata 2017-10-12 22:17:08 UTC
Created attachment 744167 [details]
redacted tails of 'journalctl | grep ntp' and of /var/log/ntp

I did 'zypper -v up' on 42.3 an hour ago after last having done so about a month ago. Subsequently, in order to boot, I had to put a '1' on kernel cmdline, login root, then 'systemctl disable ntp-wait.service before being able to start normally. I tried booting using various backups of ntp.conf without effect, then deleted /etc/ntp.conf and did 'zypper -v in -f ntp' to install a virgin (89 line) /etc/ntp.conf, which also did not help. Booting prior kernel didn't help either. Same host still boots 42.1 and TW normally.

Current status:
# systemctl list-unit-files | grep ntp
ntp-wait.service                        disabled
ntpd.service                            enabled
# systemctl status ntpd
â ntpd.service - NTP Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/ntpd.service.d
           ââ50-insserv.conf-$time.conf
   Active: active (running) since Thu 2017-10-12 17:12:04 EDT; 10min ago
     Docs: man:ntpd(1)
  Process: 1650 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS)
 Main PID: 1674 (ntpd)
    Tasks: 2 (limit: 512)
   CGroup: /system.slice/ntpd.service
           ââ1674 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
           ââ1679 ntpd: asynchronous dns resolver

Oct 12 17:12:03 systemd[1]: Starting NTP Server Daemon...
Oct 12 17:12:04 ntpd[1672]: ntpd 4.2.8p10@1.3728-o Fri Jul  7 17:41:17 UTC 2017 (1): Starting
Oct 12 17:12:04 ntpd[1672]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
Oct 12 17:12:04 ntpd[1674]: proto: precision = 0.047 usec (-24)
Oct 12 17:12:04 ntpd[1674]: restrict: ignoring line 57, mask '::' unusable.
Oct 12 17:12:04 ntpd[1674]: switching logging to file /var/log/ntp
Oct 12 17:12:04 systemd[1]: Started NTP Server Daemon.
Oct 12 17:12:04 start-ntpd[1650]: Starting network time protocol daemon (NTPD)
Hint: Some lines were ellipsized, use -l to show in full.
Comment 1 Andreas Stieger 2017-10-13 06:56:46 UTC
Since you claim a regression, please roll back maintenance updates of your one month set to identify the single update that causes this.

Method:
https://lizards.opensuse.org/2016/07/11/zypper-patch-uninstall-rollback/

List:
https://lists.opensuse.org/opensuse-updates/

zypper takes the option -R / --root dir to operate from a networked rescue system on the installed root.
Comment 2 Felix Miata 2017-10-13 07:35:10 UTC
1-I don't understand how as instructed by "Replace openSUSE-2016-XXX with the update in question." on the lizards URL I'm supposed to make a date selection for command input. The only ntp strings I see among the archive subjects are for yast2-ntp-client.

2-The significance of "zypper takes the option -R / --root dir to operate from a networked rescue system on the installed root." totally escapes me. The system is running, being used to type this, just without ntp-wait.service enabled.

3-/ filesystem is EXT4 on RAID1 and is the primary PC on my LAN.

4-Multiple 'zypper -v in' operations have been performed for selected updating since last running 'zypper -v up' on 7 Sept.
Comment 3 Reinhard Max 2017-10-13 08:42:35 UTC
The ntp package itself has not been changed since April, 25th.

(In reply to Felix Miata from comment #0)

> Oct 12 17:12:04 ntpd[1674]: restrict: ignoring line 57, mask '::' unusable.

AFAICS line 57 in an unaltered ntp.conf is empty, but line 60 reads

restrict ::1

which might be triggering this warning. Given that ntpd normally doesn't complain about this line, I wonder if either that line was either altered or you disabled IPv6 on that host.
Comment 4 Reinhard Max 2017-10-13 08:44:29 UTC
BTW, the YaST2 ntp module used to mangle /etc/ntp.conf, but I think this has recently been fixed.
Comment 5 Andreas Stieger 2017-10-13 08:57:48 UTC
(In reply to Felix Miata from comment #2)
> 1-I don't understand how as instructed by "Replace openSUSE-2016-XXX with
> the update in question." on the lizards URL

I was asking you to identify the single maintenance update that reproduces this, out of the month of updates. This was based on how you described the occurrence of the error.

> 2-The significance of "zypper takes the option -R / --root dir to operate
> from a networked rescue system on the installed root." totally escapes me.
> The system is running, being used to type this, just without
> ntp-wait.service enabled.

It was obviously a note how to roll back updates in case the system no longer boots, since you explicitly mentioned a rescue system.

Again, this is to identify the update responsible for this.
Comment 6 Felix Miata 2017-10-13 10:24:05 UTC
(In reply to Reinhard Max from comment #3)
> The ntp package itself has not been changed since April, 25th.

The /etc/ntp.conf file in place during current zypper up had its timestamp altered 7 Sept local time 09:47 (after the 'zypper up' initiated at 2017-09-06 04:00:55), which change did not alter its byte count:

-restrict 192.168.0.0  mask 255.255.255.0 nomodify notrap
+restrict 192.168.0.0  nomodify notrap mask 255.255.255.0

Previously, its content had last been changed 27 Dec. 2015. I have no recollection of having personally touched that file either at that time or in any recent weeks or months. Neither contains any line that includes the string "::". I restored the Sept 7 version, re-enabled ntp-wait.service, and rebooted. This journal line
Oct 13 05:24:16 ntpd[1703]: line 97 column 30 syntax error, unexpected T_Mask, expecting T_EOC
points to that ntp.conf line.
 
> (In reply to Felix Miata from comment #0)
> 
> > Oct 12 17:12:04 ntpd[1674]: restrict: ignoring line 57, mask '::' unusable.
> 
> AFAICS line 57 in an unaltered ntp.conf is empty, but line 60 reads
> 
> restrict ::1

I examined the content of
http://download.opensuse.org/distribution/leap/42.3/repo/oss/suse/x86_64/ntp-4.2.8p10-30.1.x86_64.rpm
and that line is in its etc/ntp.conf.

Since restoring the Sept 7 ntp.conf file, the string "::" appears nowhere in the file.
 
> which might be triggering this warning. Given that ntpd normally doesn't
> complain about this line, I wonder if either that line was either altered or
> you disabled IPv6 on that host.

IPv6 is disabled on the whole LAN using multiple methods, including kernel cmdline option ipv6.disable=1 in every grub stanza. On the reboot done following my last ntp.conf restore, I removed that option from the kernel cmdline and was able to boot normally. I rebooted almost instantly to try again without removing it, and boot again proceeded normally (my current boot). This is 'grep ntp' from current journal:
Oct 13 05:22:31 kernel: Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes)
Oct 13 05:22:53 ntpd[1780]: ntpd 4.2.8p10@1.3728-o Fri Jul  7 17:41:17 UTC 2017 (1): Starting
Oct 13 05:22:53 ntpd[1780]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
Oct 13 05:22:53 ntpd[1781]: proto: precision = 0.046 usec (-24)
Oct 13 05:22:53 ntpd[1781]: line 97 column 30 syntax error, unexpected T_Mask, expecting T_EOC
Oct 13 05:22:53 ntpd[1781]: syntax error in /etc/ntp.conf line 97, column 30
Oct 13 05:22:53 ntpd[1781]: restrict 0.0.0.0: KOD does nothing without LIMITED.
Oct 13 05:22:53 ntpd[1781]: restrict ::: KOD does nothing without LIMITED.
Oct 13 05:22:53 ntpd[1781]: switching logging to file /var/log/ntp
Oct 13 05:22:53 start-ntpd[1753]: Starting network time protocol daemon (NTPD)
Oct 13 05:22:53 systemd[1]: Starting Wait for ntpd to synchronize system clock...
Oct 13 05:23:08 systemd[1]: Started Wait for ntpd to synchronize system clock.
Oct 13 05:23:30 systemd[1]: Stopped Wait for ntpd to synchronize system clock.
Oct 13 05:23:56 kernel: Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes)
Oct 13 05:24:16 ntpd[1698]: ntpd 4.2.8p10@1.3728-o Fri Jul  7 17:41:17 UTC 2017 (1): Starting
Oct 13 05:24:16 ntpd[1698]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
Oct 13 05:24:16 ntpd[1703]: proto: precision = 0.046 usec (-24)
Oct 13 05:24:16 ntpd[1703]: line 97 column 30 syntax error, unexpected T_Mask, expecting T_EOC
Oct 13 05:24:16 ntpd[1703]: syntax error in /etc/ntp.conf line 97, column 30
Oct 13 05:24:16 ntpd[1703]: restrict 0.0.0.0: KOD does nothing without LIMITED.
Oct 13 05:24:16 ntpd[1703]: restrict ::: KOD does nothing without LIMITED.
Oct 13 05:24:16 ntpd[1703]: restrict: ignoring line 90, mask '::' unusable.
Oct 13 05:24:16 ntpd[1703]: switching logging to file /var/log/ntp
Oct 13 05:24:16 start-ntpd[1676]: Starting network time protocol daemon (NTPD)
Oct 13 05:24:16 systemd[1]: Starting Wait for ntpd to synchronize system clock...
Oct 13 05:24:33 systemd[1]: Started Wait for ntpd to synchronize system clock.

Given that I booted at least 9 times after 'zypper -v up' (based on /var/log/ntpd entries), not counting several boots to emergency mode, I can't imagine what was going on, or why mere reporting this bug instead of asking on a mailing list seems to be what made the problem disappear.

(In reply to Andreas Stieger from comment #5)
> (In reply to Felix Miata from comment #2)
> > 1-I don't understand how as instructed by "Replace openSUSE-2016-XXX with
> > the update in question." on the lizards URL
 
> I was asking you to identify the single maintenance update that reproduces
> this, out of the month of updates. This was based on how you described the
> occurrence of the error.

I'm pretty sure I understand what you were asking for, just not how to do it.
 
> > 2-The significance of "zypper takes the option -R / --root dir to operate
> > from a networked rescue system on the installed root." totally escapes me.
> > The system is running, being used to type this, just without
> > ntp-wait.service enabled.
 
> It was obviously a note how to roll back updates in case the system no
> longer boots, since you explicitly mentioned a rescue system.

The word "rescue" in the bug summary is a reference to a Grub stanza that produces in systemd runlevel1.target/rescue.target that is equivalent to sysvinit runlevel 1 (single user mode, no networking) so as not to have ntpd try to start and then halt init via the infinite systemd wait.
 
> Again, this is to identify the update responsible for this.

I don't think we'll ever know, unless this reappears after a new round of updates. I do remain bothered by the syntax error and restrict journal messages, and particularly the inability of ntp-wait.service to time out, so for now I'm leaving this open instead of closing INVALID.
Comment 7 Reinhard Max 2017-10-13 11:02:20 UTC
There is a timeout in ntp-wait.service, but it is 30000 seconds (8 hours and 20 minutes). After all, the purpose of this service is to wait until ntpd has synchronized. ;)

I am closing this as INVALID, because the warnings errors you are seeing come from lines in the config file that are either from an old default (KOD) or were added locally (line 97).

Feel free to reopen if it happens again with a config file that is based on the latest default. If you reopen, please attach the whole ntp.conf that was active when the respective logs were produced and/or the extended hang was observed.

My guess is that the hang you were seeing was unrelated to any package updates and just caused by upstream ntp servers being unreachable or unresponsive.
Comment 8 Felix Miata 2017-10-13 19:42:55 UTC
FWIW, these are the server lines in ntp.conf file used for current boot, the same as when this trouble started:
server 0.us.pool.ntp.org  iburst
server 1.us.pool.ntp.org  iburst
server 2.us.pool.ntp.org  iburst
server 3.us.pool.ntp.org  iburst

The ntp.conf file from the ntp 4.2.8p10-30.1 rpm contains no server lines.

One of the ntp.conf files I tried before filing this bug contained these server lines:
server 0.us.pool.ntp.org  iburst
server 1.us.pool.ntp.org  iburst
server 2.us.pool.ntp.org  iburst
server 3.us.pool.ntp.org  iburst
server 0.opensuse.pool.ntp.org iburst
server 1.opensuse.pool.ntp.org iburst
server 2.opensuse.pool.ntp.org iburst
server 3.opensuse.pool.ntp.org iburst

All other ntp.conf variants tried contain these:
server 0.opensuse.pool.ntp.org iburst
server 1.opensuse.pool.ntp.org iburst
server 2.opensuse.pool.ntp.org iburst
server 3.opensuse.pool.ntp.org iburst

I've changed for next boot to the following:
server -4 0.us.pool.ntp.org  iburst
server -4 1.us.pool.ntp.org  iburst
server -4 2.us.pool.ntp.org  iburst
server -4 0.opensuse.pool.ntp.org  iburst
server 3.us.pool.ntp.org  iburst

When I acquire a round tuit I'll test the last above on another or more 42.3 and/or TW installation(s), some of which are currently directed to this host rather than any of those public servers.