Bug 732930

Summary: Systemd says network failed to start - which is false.
Product: [openSUSE] openSUSE 12.1 Reporter: Carlos Robinson <carlos.e.r>
Component: NetworkAssignee: Marius Tomaschewski <mt>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bwiedemann, fcrozat, tschmidt
Version: Final   
Target Milestone: ---   
Hardware: All   
OS: SUSE Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 696902    
Attachments: logs requested
logs requested
tar requested

Description Carlos Robinson 2011-11-27 23:47:23 UTC
Note: Component of Bug is "Systemd", but there is no entry for it.

Systemd says network failed to start - which is false, network is working fine - unless I'm reading it wrong. 


Elanor:~ # systemctl

UNIT                        LOAD   ACTIVE SUB       JOB DESCRIPTION

network.service             loaded failed failed    LSB: Configure the localfs depending network interfaces
NetworkManager.service      loaded failed failed    Network Manager



Host:  11.4
Guest: 12.1 RC2
Environ:  VMWare player.
Comment 1 Andreas Jaeger 2011-12-06 14:29:02 UTC
Carlos, what kind of networking setup do you use? Dhcp or static address?
Comment 2 Carlos Robinson 2011-12-12 01:23:36 UTC
Sorry, did not press "commit" on this answer.

I see in dmesg:

[    7.607237] ifup[586]: Service network not started and mode 'auto' -> skipping


vmware is using dhcpd, nat. Host is 192.168.1.14 and guest 192.168.74.131
Comment 3 Carlos Robinson 2011-12-13 01:36:00 UTC
This is the log extract of the boot while network is started - it takes half a minute. I have added and nfs mount, but that one is fast.
If you want the full log, please ask.


Dec 13 02:21:13 Elanor haveged: arch:        x86#012vendor:      intel#012generic:     0#012i_cache:     32#012d_cache:     32#012loop_idx:    28#012loop_idxmax: 40#012loop_sz:     31094#012loop_szmax:  100861#012etime:       7111#012havege_ndpt  0
Dec 13 02:21:13 Elanor cpufreq[734]: Loading CPUFreq modules - hardware support not available..skipped
Dec 13 02:21:14 Elanor vmtoolsd[787]: Starting vmtoolsd ..done
Dec 13 02:21:14 Elanor kernel: [    7.607267] ip6_tables: (C) 2000-2006 Netfilter Core Team
Dec 13 02:21:14 Elanor kernel: [    7.677383] nf_conntrack version 0.5.0 (5913 buckets, 23652 max)
Dec 13 02:21:14 Elanor kernel: [    7.734167] ip_tables: (C) 2000-2006 Netfilter Core Team
Dec 13 02:21:14 Elanor SuSEfirewall2: Firewall rules set to CLOSE.
Dec 13 02:21:14 Elanor SuSEfirewall2_init[737]: Loading basic firewall rules ..done
Dec 13 02:21:14 Elanor network[881]: Setting up (localfs) network interfaces:
Dec 13 02:21:15 Elanor network[881]: lo
Dec 13 02:21:15 Elanor ifup:     lo        
Dec 13 02:21:15 Elanor ifup:     lo        
Dec 13 02:21:15 Elanor ifup: IP address: 127.0.0.1/8  
Dec 13 02:21:15 Elanor network[881]: lo        IP address: 127.0.0.1/8
Dec 13 02:21:15 Elanor ifup:  
Dec 13 02:21:15 Elanor network[881]: ..done    eth0      device: Intel Corporation 82545EM Gigabit Ethernet Co
Dec 13 02:21:15 Elanor ifup:     eth0      device: Intel Corporation 82545EM Gigabit Ethernet Co
Dec 13 02:21:16 Elanor kernel: [    9.728050] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 13 02:21:16 Elanor ifup-dhcp:     eth0      Starting DHCP4+DHCP6 client
Dec 13 02:21:16 Elanor dhcpcd[1264]: eth0: dhcpcd 3.2.3 starting
Dec 13 02:21:16 Elanor dhcpcd[1264]: eth0: hardware address = 00:0c:29:6b:70:77
Dec 13 02:21:16 Elanor kernel: [    9.809631] NET: Registered protocol family 17
Dec 13 02:21:16 Elanor dhcpcd[1264]: eth0: broadcasting for a lease
Dec 13 02:21:16 Elanor ifup-dhcp: . 
Dec 13 02:21:17 Elanor dhcpcd[1264]: eth0: offered 192.168.74.131 from 192.168.74.254
Dec 13 02:21:17 Elanor dhcpcd[1264]: eth0: checking 192.168.74.131 is available on attached networks
Dec 13 02:21:17 Elanor avahi-daemon[786]: Registering new address record for fe80::20c:29ff:fe6b:7077 on eth0.*.
Dec 13 02:21:18 Elanor dhcpcd[1264]: eth0: leased 192.168.74.131 for 1800 seconds
Dec 13 02:21:18 Elanor dhcpcd[1264]: eth0: adding IP address 192.168.74.131/24
Dec 13 02:21:18 Elanor avahi-daemon[786]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.74.131.
Dec 13 02:21:18 Elanor avahi-daemon[786]: New relevant interface eth0.IPv4 for mDNS.
Dec 13 02:21:18 Elanor avahi-daemon[786]: Registering new address record for 192.168.74.131 on eth0.IPv4.
Dec 13 02:21:18 Elanor dhcpcd[1264]: eth0: adding default route via 192.168.74.2 metric 0
Dec 13 02:21:18 Elanor ifdown:     eth0      device: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
Dec 13 02:21:19 Elanor ifup:     eth0      device: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
Dec 13 02:21:19 Elanor ifup-dhcp: . 
Dec 13 02:21:22 Elanor ifup-dhcp: . 
Dec 13 02:21:24 Elanor ifup-dhcp: . 
Dec 13 02:21:26 Elanor kernel: [   20.218706] eth0: no IPv6 routers present
Dec 13 02:21:27 Elanor ifup-dhcp: . 
Dec 13 02:21:29 Elanor ifup-dhcp: . 
Dec 13 02:21:30 Elanor kernel: [   24.279306] hrtimer: interrupt took 4385651 ns
Dec 13 02:21:32 Elanor ifup-dhcp: . 
Dec 13 02:21:35 Elanor ifup-dhcp: . 
Dec 13 02:21:36 Elanor network[881]: eth0      Starting DHCP4+DHCP6 client. . . . . . . .
Dec 13 02:21:36 Elanor ifup-dhcp:  
Dec 13 02:21:36 Elanor network[881]: eth0      DHCP4 continues in background
Dec 13 02:21:36 Elanor ifup-dhcp:     eth0      DHCP4 continues in background
Dec 13 02:21:36 Elanor network[881]: eth0      DHCP6 continues in background
Dec 13 02:21:36 Elanor ifup-dhcp:     eth0      DHCP6 continues in background
Dec 13 02:21:36 Elanor network[881]: [1Awaiting
Dec 13 02:21:36 Elanor network[881]: Waiting for mandatory devices:  eth0
Dec 13 02:21:44 Elanor network[881]: 7 6 5 3 2 1
Dec 13 02:21:44 Elanor network[881]: eth0      device: Intel Corporation 82545EM Gigabit Ethernet Co
Dec 13 02:21:44 Elanor network[881]: eth0      . . . is just beeing set up
Dec 13 02:21:44 Elanor network[881]: eth0      IP address: 192.168.74.131/24
Dec 13 02:21:44 Elanor network[881]: eth0      is up
Dec 13 02:21:44 Elanor network[881]: [1Awaiting
Dec 13 02:21:44 Elanor network[881]: eth0      interface could not be set up until now
Dec 13 02:21:44 Elanor network[881]: ..failedSetting up service (localfs) network  .  .  .  .  .  .  .  .  .  ...failed
Dec 13 02:21:44 Elanor systemd[1]: network.service: control process exited, code=exited status=7
Dec 13 02:21:44 Elanor systemd[1]: Unit network.service entered failed state.
Dec 13 02:21:45 Elanor bluez-coldplug[3678]: ..done
Comment 4 Frederic Crozat 2011-12-14 09:43:57 UTC
please boot with systemd.log_level=debug systemd.log_target=kmsg and attach dmesg output.
Comment 5 Carlos Robinson 2011-12-14 13:03:25 UTC
Created attachment 467433 [details]
logs requested
Comment 6 Frederic Crozat 2011-12-14 17:18:00 UTC
could you try to edit /etc/sysconfig/network/if-up.d/21-cifs
and comment line :
 log_msg=$( rcnmb start 2>&1)

and instead, put 
systemctl restart --ignore-dependencies nmd.service
Comment 7 Carlos Robinson 2011-12-14 21:11:17 UTC
Created attachment 467533 [details]
logs requested

Yep, here you have. Rebooted in verbose mode and attached demsg output.

Elanor:~ # rcnetwork status
redirecting to systemctl
network.service - LSB: Configure the localfs depending network interfaces
          Loaded: loaded (/etc/init.d/network)
          Active: failed since Wed, 14 Dec 2011 22:04:35 +0100; 5min ago
         Process: 855 ExecStart=/etc/init.d/network start (code=exited, status=7/NOTRUNNING)
          CGroup: name=systemd:/system/network.service
                  ? 1258 /sbin/dhclient6 -6 -cf /var/lib/dhcp6/dhclient6.eth0.conf -lf /var/lib/dhcp6/dhclient6.eth0.lease -pf /var/run/dhclient6.eth0.pid -q eth0
                  ? 4061 /sbin/dhcpcd --netconfig -L -E -c /etc/sysconfig/network/scripts/dhcpcd-hook -t 0 -h Elanor eth0
Elanor:~ #
Comment 8 Frederic Crozat 2011-12-20 09:36:50 UTC
so, it worked ? (it is not clear for your last message).
Comment 9 Carlos Robinson 2011-12-20 13:39:57 UTC
No, you can see in the output of rcnetwork status that it failed.

The problem is that network is working fine but the status says failed, in red color.
Comment 10 Frederic Crozat 2011-12-20 13:44:09 UTC
looks like /etc/init.d/network is returning non zero status.
Comment 11 Marius Tomaschewski 2011-12-21 10:26:58 UTC
Yes, it returns non-zero status when it is unable to configure in the time
specified in

/etc/sysconfig/network/config: WAIT_FOR_INTERFACES (default: 30 sec)

This message is not a problem, just an info:

ifup[571]: Service network not started and mode 'auto' -> skipping

and means, that the eth0 became visible but could not be started via udev
hotplug, because the network service (rcnetwork) were not yet started.

The question is what is the reason for. A call of "ps axf" as soon as
possible (while the network is still waiting/running) should show it.
It is very likely the if-up.d/21-dhcpcd-hook-samba script blocks here
(not only the 21-cifs) trying "systemdctl force-reload nmb.service"
that blocks, even nmb service is completely unused.

OK, rcnetwork has been started in the second 7:
[    7.598059] systemd[1]: About to execute: /etc/init.d/network start
[    7.602976] systemd[1]: Forked /etc/init.d/network as 855
[    7.603095] systemd[1]: network.service changed dead -> start
...
[   12.104185] systemd[1]: Trying to enqueue job nmb.service/reload/replace
[   12.104207] systemd[1]: Installed new job nmb.service/reload as 192
[   12.104212] systemd[1]: Enqueued job nmb.service/reload as 192
[   12.104333] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
[   12.104409] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/nmb_2eservice

This is (one of) the nmb reload mentioned above.

...
[   19.763112] eth0: no IPv6 routers present

OK, here the interface setup basically finished (IPv6 autoconf)...

[   39.490158] systemd[1]: Received SIGCHLD from PID 855 (network).
[   39.490192] systemd[1]: Got SIGCHLD for process 855 (network)
[   39.490234] systemd[1]: Child 855 died (code=exited, status=7/NOTRUNNING)
[   39.490237] systemd[1]: Child 855 belongs to network.service
[   39.490242] systemd[1]: network.service: control process exited, code=exited status=7
[   39.490245] systemd[1]: network.service got final SIGCHLD for state start
[   39.490272] systemd[1]: network.service changed start -> failed
[   39.490341] systemd[1]: Job network.service/start finished, result=failed
[   39.490356] systemd[1]: Unit network.service entered failed state.

rcnetwork failed due to timeout (39 sec - 7 sec > 30 sec).

[   39.490441] systemd[1]: network.target changed dead -> active
[   39.490446] systemd[1]: Job network.target/start finished, result=done
[   39.490454] systemd[1]: Job nmb.service/reload finished, result=skipped

The nmb.service reload returned finally...

So this seems to be duplicate of bug 735943 / bug 725503 / bug 724777.

A "rm -f /etc/sysconfig/network/if-up.d/21*" should solve the problem as
workaround.

Does it?

Please attach also the output of "ps axwwwf" when possible to catch
the moment where rcnetwork is still waiting (executes ifstatus again
and again).
Comment 12 Frederic Crozat 2011-12-21 11:45:44 UTC
please don't remove any file, just move them in a different place ;)

For if-up.d/21-dhcpcd-hook-samba, we could also try to use a --ignore-dependencies for the systemctl call.
Comment 13 Carlos Robinson 2011-12-21 20:19:19 UTC
(In reply to comment #11)

> A "rm -f /etc/sysconfig/network/if-up.d/21*" should solve the problem as
> workaround.
> 
> Does it?

Yep. Well, I moved them :-)


Elanor:~ # rcnetwork status
redirecting to systemctl
network.service - LSB: Configure the localfs depending network interfaces
          Loaded: loaded (/etc/init.d/network)
          Active: active (running) since Wed, 21 Dec 2011 21:11:51 +0100; 2min 57s ago
         Process: 853 ExecStart=/etc/init.d/network start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/network.service
                  ├ 1248 /sbin/dhclient6 -6 -cf /var/lib/dhcp6/dhclient6.eth0.conf -lf /var/lib/dhcp6/dhclient6.eth0.lease -pf /var/run/dhclient6.eth0.pid -q eth0
                  └ 1790 /sbin/dhcpcd --netconfig -L -E -c /etc/sysconfig/network/scripts/dhcpcd-hook -t 0 -h Elanor eth0
Elanor:~ # 


> Please attach also the output of "ps axwwwf" when possible to catch
> the moment where rcnetwork is still waiting (executes ifstatus again
> and again).

This I don't know how to do it. I do not have access till I can log in, directly or via ssh, and by that time I guess it is too late. The code should have to be inserted in a service script, and that is something I do not know how to with systemd.
Comment 14 Marius Tomaschewski 2011-12-22 09:08:14 UTC
I were testing it in the last days. It are AFAIS all timing problems because
of some hanging services or systemctl "deadlocks" because of dependencies or
whatever. Further, the fact, that systemd is using own status instead of the
status returned by "rcnetwork status".

Try to set DEBUG=EXTRA in /etc/sysconfig/network/config -- this will produce
a lot of "bash -x" traces in /dev/.sysconfig. Please attach a tar of this
directory.

Except of the nmb reload, it could be also the ntp that needs a lot of time,
at least while restart. It also tries to communicate with IPv6 ntp servers
while there is no global IPv6 address it could use (AdvManagedFlag yes,
AdvAutonomous off on the router and no dhcpv6 client started [yet]).

Workarounds are:
 - move the /etc/sysconfig/network/if-*.d/21* links away
 - remove ntp-runtime from NETCONFIG_MODULES_ORDER
   in /etc/sysconfig/network/config.
Comment 15 Carlos Robinson 2011-12-22 10:15:16 UTC
Created attachment 468649 [details]
tar requested
Comment 16 Carlos Robinson 2011-12-22 10:19:36 UTC
(In reply to comment #14)

> Try to set DEBUG=EXTRA in /etc/sysconfig/network/config -- this will produce
> a lot of "bash -x" traces in /dev/.sysconfig. Please attach a tar of this
> directory.

Done.

> Except of the nmb reload, it could be also the ntp that needs a lot of time,
> at least while restart. 

Ntp is not enabled. Machine is a virtual guest in vmware player, used for testing purposes. It is easier to tell the guest tools to sync time with host.


Elanor:~ # rcntp status
redirecting to systemctl
ntp.service - LSB: Network time protocol daemon (ntpd)
          Loaded: loaded (/etc/init.d/ntp)
          Active: inactive (dead)
          CGroup: name=systemd:/system/ntp.service
Elanor:~ #
Comment 17 Bernhard Wiedemann 2012-01-02 16:21:21 UTC
I found a related problem:
rcnetwork stop
would not stop the network, when in "failed" state (even though it was up).

moving /etc/sysconfig/network/if-up.d/21-dhcpcd-hook-samba also helped here.

testing in a kvm VM with dhcp4 - no NTP or IPv6 involved.
Comment 18 Bernhard Wiedemann 2012-01-02 16:31:48 UTC
commenting line 200 of /etc/sysconfig/network/if-up.d/21-dhcpcd-hook-samba
with the force-reload prevents the problem, so the reload is likely waiting for the "network start" to finish.
Comment 19 Marius Tomaschewski 2012-01-04 15:07:43 UTC
See also bug 727771 and bug 732910 -- the systemd update has been released
and is available already. sysconfig and dhcp-client test packages are at:

http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.1:/Update:/Test/standard/x86_64/
Comment 20 Bernhard Wiedemann 2012-01-04 17:18:07 UTC
(In reply to comment #19)
> See also bug 727771 and bug 732910 -- the systemd update has been released
> and is available already. sysconfig and dhcp-client test packages are at:
> 
> http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.1:/Update:/Test/standard/x86_64/

I installed sysconfig and dhcp-client from this repo and all other updates, but after reboot, there was still

# rcnetwork status
redirecting to systemctl
network.service - LSB: Configure the localfs depending network interfaces
          Loaded: loaded (/etc/init.d/network)
          Active: failed since Wed, 04 Jan 2012 12:11:10 -0500; 53s ago
         Process: 643 ExecStart=/etc/init.d/network start (code=exited, status=7/NOTRUNNING)
          CGroup: name=systemd:/system/network.service
                  ├ 1191 /sbin/dhclient6 -6 -cf /var/lib/dhcp6/dhclient6.eth0.conf -lf /var/lib/dhcp6/d...
                  └ 4406 /sbin/dhcpcd --netconfig -L -E -HHH -c /etc/sysconfig/network/scripts/dhcpcd-h...


which goes away when I deactivate line 200 of /etc/sysconfig/network/if-up.d/21-dhcpcd-hook-samba
Comment 21 Marius Tomaschewski 2012-01-10 13:10:10 UTC
It is a dup of bug 732395 alias bug 725503 alias ...

*** This bug has been marked as a duplicate of bug 732395 ***