Bug 821879

Summary: udev update breaks network
Product: [openSUSE] openSUSE 12.3 Reporter: Bernhard Wiedemann <bwiedemann>
Component: Update ProblemsAssignee: Robert Milasan <rmilasan>
Status: RESOLVED FIXED QA Contact: Jiri Srain <jsrain>
Severity: Critical    
Priority: P2 - High CC: bwiedemann, dmueller, fcrozat, jbohac, meissner, mrueckert, mt, rmilasan
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 12.3   
See Also: https://bugzilla.novell.com/show_bug.cgi?id=835812
Whiteboard: maint:running:53579:moderate maint:released:sle11-sp2:53588
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: /var/log/messages
journal
journal with systemd debug
snapshot3
mt-snapshot4
[PATCH] Disard /etc/init.d/network start errors under systemd
20130711-mt-test-1_3 output archive
screenshot: network in active exited state, even it reported 0 and dhcpcd is running
network.service
network.service (mt 1)
systemctl show network.service and dump in both states

Description Bernhard Wiedemann 2013-05-27 16:06:04 UTC
Reproducible: Always

Steps to Reproduce:
have a 12.3 x86_64 VM with eth0 configured for dhcp4
ssh to the VM and do
zypper in udev-195-13.25.1

Actual Results:
Installing: udev-195-13.25.1 ...............<50%>====================[\]
in the middle of udev installation, network goes down, no dhcpcd running


This is very bad for servers where you often have no easy way
to do rcnetwork restart (which reports an error after 31s but still brings the dhcpcd and network back)
Comment 1 Bernhard Wiedemann 2013-05-27 16:10:28 UTC
Created attachment 541408 [details]
/var/log/messages

network breakage is at 12:03:33
Comment 3 Bernhard Wiedemann 2013-05-27 16:48:18 UTC
interestingly, this does not happen on all systems,
but at least on those affected systems
it can also be triggered again with
zypper in -f udev
Comment 4 Robert Milasan 2013-05-28 07:17:24 UTC
Does this happen on 12.2? Don't think is udev to blame here, but not sure.
Comment 5 Bernhard Wiedemann 2013-05-28 07:53:34 UTC
12.2 is not affected with libudev0-182-4.33.1

at least it is the udev update breaking 12.3 networking
could still be problems with systemd or other background magic
that sends the SIGTERM to dhcpcd
Comment 6 Marius Tomaschewski 2013-05-28 08:01:51 UTC
(In reply to comment #0)
> in the middle of udev installation, network goes down, no dhcpcd running

"systemctl status network.service" shows it is stopped?
Then I guess, systemd resolves some "Wants" dependencies and stops
it or something like this. Please enable systemd debugging, so it
is visible what systemd is doing (starting/stopping).

> This is very bad for servers where you often have no easy way
> to do rcnetwork restart (which reports an error after 31s but
> still brings the dhcpcd and network back)

Sure. Which errors does it show exactly?
Comment 7 Robert Milasan 2013-05-28 08:05:12 UTC
You are right, in 12.3 seems to be an issue, but I believe only when using
ifup/ifdown.

I've enabled 'set -x' in ifup/ifdown and try to restart the network:

/etc/init.d/network restart

2013-05-28T03:53:10.346919-04:00 rmilasan-test-server network[11284]: + exit 7
2013-05-28T03:53:10.347523-04:00 rmilasan-test-server network[11284]: ..failed 
  eth0      interface could not be set up until now
2013-05-28T03:53:10.356923-04:00 rmilasan-test-server network[11284]:
..failedSetting up service network  .  .  .  .  .  .  .  .  .  .  .  . 
...failed
2013-05-28T03:53:10.359944-04:00 rmilasan-test-server systemd[1]:
network.service: control process exited, code=exited status=7
2013-05-28T03:53:10.359955-04:00 rmilasan-test-server systemd[1]: Failed to
start LSB: Configure network interfaces and set up routing.
2013-05-28T03:53:10.359958-04:00 rmilasan-test-server systemd[1]: Unit
network.service entered failed state

This error doesn't happen if not running manually /etc/init.d/network restart.

Seems the the interface is there, but for some reason it didn't start again
dhcpd, it got SIGTERM, but nothing made it to come up.

BTW, how do I enable debugging in systemd while the system is running. I know for udev, but doesn't help, don't get anything.
Comment 8 Frederic Crozat 2013-05-28 08:11:36 UTC
(In reply to comment #7)

> BTW, how do I enable debugging in systemd while the system is running. I know
> for udev, but doesn't help, don't get anything.

SIGRTMIN+22 to PID 1, which means:

kill -56 1
Comment 9 Bernhard Wiedemann 2013-05-28 08:34:04 UTC
# systemctl status network.service 
network.service - LSB: Configure network interfaces and set up routing
          Loaded: loaded (/etc/init.d/network)
          Active: failed (Result: exit-code) since Tue, 2013-05-28 04:30:52 EDT; 2min 14s ago
         Process: 9698 ExecStart=/etc/init.d/network start (code=exited, status=7)
          CGroup: name=systemd:/system/network.service
                  └ 1544 /sbin/dhcpcd --netconfig -L -E -HHH -c /etc/sysconfig/network/scripts/dhcpcd-hook -t 0 -h ...

May 28 04:31:42 bwiedemann-cloud dhcpcd[1544]: eth0: adding IP address 10.122.160.102/21
May 28 04:31:42 bwiedemann-cloud dhcpcd[1544]: eth0: adding default route via 10.122.160.77 metric 0
May 28 04:31:42 bwiedemann-cloud dhcpcd[1544]: eth0: setting hostname to `bwiedemann-cloud'
May 28 04:31:43 bwiedemann-cloud ifup[13413]: eth0      name: RTL-8139/8139C/8139C+
May 28 04:32:37 bwiedemann-cloud dhcpcd[1544]: eth0: renewing lease of 10.122.160.102
May 28 04:32:37 bwiedemann-cloud dhcpcd[1544]: eth0: leased 10.122.160.102 for 120 seconds
May 28 04:32:37 bwiedemann-cloud dhcpcd[1544]: eth0: adding IP address 10.122.160.102/21
May 28 04:32:37 bwiedemann-cloud dhcpcd[1544]: eth0: adding default route via 10.122.160.77 metric 0
May 28 04:32:37 bwiedemann-cloud dhcpcd[1544]: eth0: setting hostname to `bwiedemann-cloud'
May 28 04:32:37 bwiedemann-cloud ifup[13703]: eth0      name: RTL-8139/8139C/8139C+


During network restart, /var/log/messages get such entries every second:
2013-05-28T04:26:17.199416-04:00 bwiedemann-cloud systemd[1]: Accepted connection on private bus.
2013-05-28T04:26:17.200410-04:00 bwiedemann-cloud systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.GetAll() on /org/freedesktop/systemd1/unit/network_2eservice
2013-05-28T04:26:17.201079-04:00 bwiedemann-cloud systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
2013-05-28T04:26:17.256321-04:00 bwiedemann-cloud systemd[1]: Received SIGCHLD from PID 8409 (ifstatus-route).
2013-05-28T04:26:17.256333-04:00 bwiedemann-cloud systemd[1]: Got SIGCHLD for process 8409 (ifstatus-route)
2013-05-28T04:26:17.256336-04:00 bwiedemann-cloud systemd[1]: Child 8409 died (code=exited, status=1/FAILURE)


calling /etc/sysconfig/network/scripts/ifstatus-route eth0
directly in a root shell did not return an error
Comment 10 Robert Milasan 2013-05-28 08:48:33 UTC
Was thinking and maybe this is also one of the issue, stopping and starting udev. What if we only restart udevd (nothing else) at %post ?

Will try this out now and will let you know.
Comment 11 Robert Milasan 2013-05-28 09:07:22 UTC
OK, looks like my idea with the restart only didn't help.
Comment 12 Robert Milasan 2013-05-28 11:39:19 UTC
OK, after some investigation looks like (at least to me) that there is a 'remove' event, but not an 'add|change' event from kernel to udev. The other weird part is that eth0 never really goes away, meaning it looks active, usually if you do: 'ifconfig eth0 down' then run only 'ifconfig' doesn't show 'eth0', but in our case it's always there without of course an ip address as dhcpcd is not running.

Also, this can be reproduced in Virtualbox by installing normal openSUSE 12.3 machine and using ifup/ifdown instead of NetworkManager, but in Virtualbox you can restart the network without any actual errors.
Comment 13 Robert Milasan 2013-05-28 11:49:12 UTC
Also as a note here (forgot to write before), running:

zypper in -f udev
# re-install udev, network goes down, eth0 doesn't have an ip address
/etc/init.d/network restart
# restarted network, ip address assign to eth0
zypper in -f udev
# re-install udev, network up and running, ip address still assign to eth0

The second time re-installing udev on a running system doesn't kill dhcpcd and the network is up and running normally.
Comment 14 Robert Milasan 2013-05-28 11:58:32 UTC
Again another note here:

If we switch from dhcpcd to dhclient in /etc/sysconfig/network/dhcp the problem can't be reproduced.

opensuse:~ # cat /etc/sysconfig/network/dhcp|grep ^DHCLIENT_BIN
DHCLIENT_BIN="dhclient"

opensuse:~ # cat /etc/sysconfig/network/dhcp|grep ^DHCLIENT_DEBUG
DHCLIENT_DEBUG="yes"
But debug mode doesn't seem to work, at least when using dhclient.
Comment 15 Robert Milasan 2013-05-28 12:06:20 UTC
OK, the debug works (somewhat). I was thinking that the option DHCLIENT_DEBUG means to enable debugging in dhclient or dhcpcd, turns out is debugging for ifup/ifdown (set -x).
Comment 16 Bernhard Wiedemann 2013-05-28 12:20:28 UTC
found something:
rcnetwork status has
    eth0      is up, but ipv6 duplicate address check failed         dead

but IPv6 is not actually used in that network

yet, when I do
ping6 -Ieth0 fe80::1234
connect: Cannot assign requested address

and dmesg has
[  449.275203] IPv6: eth0: IPv6 duplicate address fe80::f816:3eff:fe3c:f792 detected!

though nothing goes over the network.
Can this be a kernel problem?
Comment 17 Marius Tomaschewski 2013-05-28 12:57:16 UTC
(In reply to comment #14)
> Again another note here:
> 
> If we switch from dhcpcd to dhclient in /etc/sysconfig/network/dhcp the problem
> can't be reproduced.
> 
> opensuse:~ # cat /etc/sysconfig/network/dhcp|grep ^DHCLIENT_BIN
> DHCLIENT_BIN="dhclient"
> 
> opensuse:~ # cat /etc/sysconfig/network/dhcp|grep ^DHCLIENT_DEBUG
> DHCLIENT_DEBUG="yes"
> But debug mode doesn't seem to work, at least when using dhclient.

It causes to writes debug to /var/log/dhclient-script.$interface.log.

Bernhard,

We have to find out what's the problem with the network first before
we continue to check what happens while "zypper in -f udev".

Please stop the network, set DEBUG=EXTRA in /etc/sysconfig/network/config
(bash -vx of all scripts), then start the network and attach the tar
archive created by:
   "tar cvjpf /tmp/network-debug.tar.bz2 /dev/.sysconfig/network"


On 12.3-GA, there is sysconfig-0.80.5-1.2.1. In updates, there is
sysconfig-0.80.5-1.5.1, that provides a fix in this area ... but
perhaps the fix does not work properly:

* Mi Mär 20 2013 mt@suse.de
- Fixed to wait for dhcp/ipv6 under systemd again.
  Fixed regression caused by bnc#785240, bnc#780644 fixes to not
  discard the dhcp/ipv6 dupplicate address detection in progress
  error codes under systemd completely, but wait until dhcp/ipv6
  dad finished or the WAIT_FOR_INTERFACES timeout is reached and
  then discard in the status returned to systemd (bnc#808718).
  It caused failures of other services trying to bind tentative
  IPv6 addresses, e.g. in mixed dhcp4 / static IPv6 setups.
  Thanks to Rolf Eike Beer for the report/tests/debug outputs.
  [0001-Fixed-to-wait-for-dhcp-ipv6-under-systemd-again.patch]

Which sysconfig are you using? When the newer one, please install
the older version and try again. When this solves the rcnetwork
errors, there is a bug in the fix.

Please provide DEBUG=EXTRA output of the version causing it or
from both, but please make sure to remove all exdeb.* files from
older runs first.

(In reply to comment #16)
> found something:
> rcnetwork status has
>     eth0      is up, but ipv6 duplicate address check failed         dead
> 
> but IPv6 is not actually used in that network

No, but duplicate address detection is always done for IPv6 and you
seem to have another machine in the network using same MAC address
as yours.
Comment 18 Bernhard Wiedemann 2013-05-29 10:33:07 UTC
I did
echo net.ipv6.conf.eth0.accept_dad=0 >> /etc/sysctl.conf
reboot

and then, rcnetwork restart would succeed after 3.5s 
instead of failing after 31s ; and I could ping6
but after next boot I still get duplicate address errors
maybe because network is initialized before sysctl.conf is used?

sysctl downgrade did not really help either



I think I found it - the VM host interface is using the same MAC as the VM:
# ifconfig vnet3
vnet3     Link encap:Ethernet  HWaddr FE:16:3E:23:F6:4D  
          inet6 addr: fe80::fc16:3eff:fe23:f64d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4832 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7310 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:530237 (517.8 Kb)  TX bytes:12089457 (11.5 Mb)

# ip a
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:23:f6:4d brd ff:ff:ff:ff:ff:ff
    inet 10.122.160.103/21 brd 10.122.160.255 scope global eth0
    inet6 fe80::f816:3eff:fe23:f64d/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever

so there really is a MAC-addr collision.
normally, the vnet3 interface should not have an IP addr at all,
because it is enslaved in the bridge

I tried on the host
ip addr del fe80::fc16:3eff:fe23:f64d/64 dev vnet3
but it did not help. 
IMHO, the thing to do is to use different MAC addrs for interfaces on the host-side and the VM-side
because those are really two interfaces interconnected by a virtual cross-over cable.
ifconfig vnet3 hw ether FE:16:3E:23:F6:4E
did not help either though
Comment 19 Bernhard Wiedemann 2013-05-29 10:45:16 UTC
err. host MAC starts with FE but VM's with FA
so no MAC / ipv6 collision there

could still be a kernel bug
http://ipv6-or-no-ipv6.blogspot.de/2013/02/ipv6-duplicate-address-in-linux.html


tcpdump (on the VM's host) shows the NDP which is used for duplicate addr detection

10:09:25.650943 00:26:f0:11:00:00 > 01:00:0c:cc:cc:cd SNAP Unnumbered, ui, Flags
 [Command], length 50
10:09:27.604912 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
10:09:27.604936 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
10:09:27.650957 00:26:f0:11:00:00 > 01:00:0c:cc:cc:cd SNAP Unnumbered, ui, Flags [Command], length 50
10:09:27.657902 IP6 :: > ff02::1:ff23:f64d: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe23:f64d, length 24
10:09:27.657911 IP6 :: > ff02::1:ff23:f64d: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe23:f64d, length 24
10:09:27.919212 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:23:f6:4d, length 300
10:09:27.919243 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:23:f6:4d, length 300
10:09:27.919387 IP 10.122.160.97.67 > 10.122.160.103.68: BOOTP/DHCP, Reply, length 300

interestingly, sent packages are shown twice
Comment 20 Bernhard Wiedemann 2013-05-29 14:25:26 UTC
http://www.spinics.net/lists/netdev/msg127648.html

is the underlying kernel problem
and there must be something in the network
(e.g. not properly tagged VLANs) that duplicates those multicasts
Comment 21 Bernhard Wiedemann 2013-05-29 14:44:01 UTC
It looks as if the duplication happens in the bridge interface

# tcpdump -s 1600 -epni vnet3 icmp6
tcpdump: WARNING: vnet3: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet3, link-type EN10MB (Ethernet), capture size 1600 bytes
14:37:23.392257 fa:16:3e:23:f6:4d > 33:33:ff:23:f6:4d, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff23:f64d: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe23:f64d, length 24
14:37:23.392279 fa:16:3e:23:f6:4d > 33:33:ff:23:f6:4d, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff23:f64d: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe23:f64d, length 24


# tcpdump -s 1600 -epni br563 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br563, link-type EN10MB (Ethernet), capture size 1600 bytes
14:37:23.392257 fa:16:3e:23:f6:4d > 33:33:ff:23:f6:4d, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ff23:f64d: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe23:f64d, length 24


The two packets are only 22 microseconds apart which is about the time
a ping on 127.0.0.1 takes but much less than anything over the network.

also
brctl showmacs br563 |grep 33:33:ff:23:f6:4d

does not return anything, so I guess the default bridge behaviour
is to pass the packet to all ports - 
but should that really include the port it came from?

Host kernel is 3.0.74-0.6.8-default
Comment 22 Marius Tomaschewski 2013-05-30 10:54:22 UTC
(In reply to comment #18)
> I did
> echo net.ipv6.conf.eth0.accept_dad=0 >> /etc/sysctl.conf
> reboot
> 
> and then, rcnetwork restart would succeed after 3.5s 
> instead of failing after 31s ; and I could ping6
> but after next boot I still get duplicate address errors
> maybe because network is initialized before sysctl.conf is used?

Move it to /etc/sysconfig/network/ifsysctl -- "man 5 ifsysctl";
this one is applied a) by udev, b) by ifup.

(In reply to comment #19)
> err. host MAC starts with FE but VM's with FA

yes, they differ. but as they're generated/random, another vm on
another host still may have it.

> could still be a kernel bug
> http://ipv6-or-no-ipv6.blogspot.de/2013/02/ipv6-duplicate-address-in-linux.html

This could be.

You aren't using bonding, are you? Linux bonding AFAIR sends out dad
through each port. The ipv6 code is IMO not aware of and detects dups
when it gets two answers. It basically does not check received > sent,
but received > 1. See also bug#715430.

Workaround is to set "disable_ipv6 = 1" or at least "accept_dad = 0"
an bridge-ports & bond-slaves [where no IPv6 addr is assigned anyway].

(In reply to comment #21)
> It looks as if the duplication happens in the bridge interface
[...]
> The two packets are only 22 microseconds apart which is about the time
> a ping on 127.0.0.1 takes but much less than anything over the network.
> 
> also
> brctl showmacs br563 |grep 33:33:ff:23:f6:4d
                             ^^^^^^^^^^^^^^^^^

This (33:33:*) is a IPv6 multicast address mapped to ethernet MAC.

> does not return anything, so I guess the default bridge behaviour
> is to pass the packet to all ports - 
> but should that really include the port it came from?
> 
> Host kernel is 3.0.74-0.6.8-default

I've not noticed such behavior on my hosts + vm's using same kernel.

Jiri, can you take a look / do you have an idea what goes wrong here?
Comment 23 Bernhard Wiedemann 2013-05-31 15:06:56 UTC
after reading linux/net/bridge/br_forward.c should_deliver function
I finally managed to track it down to the "hairpin_mode"
causing the trouble and it turns out that other people also hit this:
https://bugs.launchpad.net/nova/+bug/1011134

so we really have 3 bugs:

1. VM interfaces on compute-hosts of SUSE Cloud 1.0 / OpenStack
   have hairpin_mode on, making them receive back their own traffic

2. with the Host's bridge treating multicast like broadcast,
   VM's Linux IPv6 duplicate address detection receives its own sent packet
   and assumes some other host is using the same addr

3. 12.3's systemd assumes this means a failure in the network script
   and on udev upgrade the network is killed


Steps to Reproduce:
have a br0 with tap0 enslaved
echo 1 > /sys/class/net/br0/brif/tap0/hairpin_mode
qemu-kvm -net nic,macaddr=56:00:56:76:54:32 -net
tap,ifname=tap0,script=no,downscript=no -m 1000 -hda test.img -monitor stdio
-vnc :9

and that still happens with a 12.3 Host and 3.7.10 kernel
Comment 24 Jiri Bohac 2013-05-31 16:30:55 UTC
(In reply to comment #23)
> https://bugs.launchpad.net/nova/+bug/1011134
> 
> so we really have 3 bugs:
> 
> 1. VM interfaces on compute-hosts of SUSE Cloud 1.0 / OpenStack
>    have hairpin_mode on, making them receive back their own traffic

I think you're correct. 
My understanding of VEPA is that the hairpin mode is there to allow the bridge to act as an uplink to another VEPA-mode bridge connected to the specific port.
Setting the hairpin mode on a VSI is wrong. The above mentioned Ubuntu bug, has a link to an openstack "fix" for this: https://review.openstack.org/#/c/14017/
Instead of not setting the hairpin mode on the VSIs, they set up a packet filter to work around the problem. Am I missing something?

> 2. with the Host's bridge treating multicast like broadcast,
>    VM's Linux IPv6 duplicate address detection receives its own sent packet
>    and assumes some other host is using the same addr

Multicasts are treated like broadcasts by L2 switches - I don't see what's wrong with that. They just should not be looped back to the port they came from, which is what the hairpin misconfiguration causes.

David Miller has rejected a patch trying to work around such a network misconfiguration: http://www.spinics.net/lists/netdev/msg127696.html
I agree. I think it's a good thing that DAD detects duplicate addresses caused by a duplicate MAC address - something quite likely to happen with virtual machines.

> 3. 12.3's systemd assumes this means a failure in the network script
>    and on udev upgrade the network is killed

Yes, this looks incorrect. For example, one may have more than one IPv6 addresses configured. One of them failing DAD should not be a reason to deconfigure the network -- the other addresses can still be used, IPv4 can be used.
Comment 25 Bernhard Wiedemann 2013-05-31 17:07:03 UTC
1. yes, IMHO upstream did an ugly workaround 
(and will even leave all other broadcasts coming back)

using git blame on openstack nova code showed that this was introduced for
https://bugs.launchpad.net/nova/+bug/933640
https://github.com/openstack/nova/commit/b61e1ea12cd41ea507b1f6496ec1413c93bd679b

mentioning the need to reach your own floating IP
that is only NATed on the host
but I just tested that and it seems to not be working anyway
so we could just as well disable hairpin_mode
to avoid all associated trouble


2. if there is a duplicate MAC address - 
   shouldn't it then also refuse to do IPv4 networking?
   I guess our duplicate address detection code is less picky there.
   (admitted: IPv4 does more things in userspace than IPv6)
Comment 26 Bernhard Wiedemann 2013-06-05 07:19:06 UTC
It seems the original problem about udev/network
is unrelated to the duplicate address detection from openstack
as it still randomly happens sometimes after workarounding it.
Thus I split the OpenStack issue into a separate bug 823345 .


When it happens, the network status is not "active (running)"
but "active (exited)"

redirecting to "systemctl  status network.service"
network.service - LSB: Configure network interfaces and set up routing
          Loaded: loaded (/etc/init.d/network)
          Active: active (exited) since Wed, 2013-06-05 09:02:54 EDT; 57s ago
         Process: 384 ExecStart=/etc/init.d/network start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/network.service
                  └ 1500 /sbin/dhcpcd --netconfig -L -E -HHH -c /etc/syscon

Jun 05 09:02:54 bwiedemann-udevtest ifup[1446]: eth0      name: RTL-8139/813
Jun 05 09:02:54 bwiedemann-udevtest dhcpcd[1036]: eth0: exiting
Jun 05 09:02:54 bwiedemann-udevtest network[384]: eth0      Starting DHCP4 cl
Jun 05 09:02:54 bwiedemann-udevtest ifup-dhcp[1512]: 
Jun 05 09:02:54 bwiedemann-udevtest network[384]: eth0      IP address: 10.12
Jun 05 09:02:54 bwiedemann-udevtest ifup-dhcp[1514]: 
Jun 05 09:02:54 bwiedemann-udevtest ifup-dhcp[1517]: eth0      IP address: 10
Jun 05 09:02:54 bwiedemann-udevtest ifup-dhcp[1520]: eth0      IP address: 10
Jun 05 09:02:54 bwiedemann-udevtest network[384]: ..doneSetting up service ne
Jun 05 09:02:54 bwiedemann-udevtest systemd[1]: Started LSB: Configure netwo
Comment 27 Bernhard Wiedemann 2013-06-05 07:41:36 UTC
Created attachment 542777 [details]
journal

it seems on boot there are two dhcpcd processes running at the same time

ifup-dhcp[1033]: Starting DHCP4 client on eth0
dhcpcd[1038]: eth0: dhcpcd 3.2.3 starting
dhcpcd[1038]: eth0: hardware address = fa:16:3e:23:e2:65
dhcpcd[1038]: eth0: broadcasting for a lease
dhcpcd[1038]: eth0: offered 10.122.160.107 from 10.122.160.97
dhcpcd[1038]: eth0: checking 10.122.160.107 is available on attached networks
ifup-dhcp[1075]: eth0      Starting DHCP4 client
dhcpcd[1077]: eth0: dhcpcd already running on pid 1038 (/var/run/dhcpcd-eth0.pid)
dhcpcd[1077]: eth0: exiting
ifup-dhcp[1094]: .
ifup-dhcp[1113]: .
dhcpcd[1038]: eth0: leased 10.122.160.107 for 120 seconds
dhcpcd[1038]: eth0: adding IP address 10.122.160.107/21
dhcpcd[1038]: eth0: adding default route via 10.122.160.97 metric 0
ifdown[1354]: eth0      name: RTL-8139/8139C/8139C+
dhcpcd[1038]: eth0: setting hostname to `bwiedemann-udevtest'
ifup[1448]: eth0      name: RTL-8139/8139C/8139C+
dhcpcd[1038]: eth0: exiting
network[380]: eth0      Starting DHCP4 client.
Comment 28 Marius Tomaschewski 2013-06-05 08:14:39 UTC
(In reply to comment #27)
> Created an attachment (id=542777) [details]
> journal
> 
> it seems on boot there are two dhcpcd processes running at the same time

No, there are multiple stages: dhcpcd gets a lease and calls dhcpcd-hook
with "up", "new", "down", "complete" [which results in "ifdown -o dhcp",
"netconfig" and "ifup -o dhcp"]. Then it forks.

Basically the problem is, that when /etc/init.d/network reports !0 to
systemd, systemd will stop it again and not start services requiring it.
When it does not report failure on error conditions, other services will
fail.

For IPv6 it is ON by default and when there is a dup detected on the
link local address, IPv6 is basically not usable on this interface
(multicasts, ... also a dhcpv6 client which requires it will fail).
Further, there is no check which address fails or if other services
are using (binding it) or not. Either everything worked or not.
As the failure happens on a mandatory interface, rcnetwork fails.

You can a) disable dad via ifsysctl file, b) disable IPv6 (per interface).

(In reply to comment #25)
> 2. if there is a duplicate MAC address - 
>    shouldn't it then also refuse to do IPv4 networking?
>    I guess our duplicate address detection code is less picky there.
>    (admitted: IPv4 does more things in userspace than IPv6)

For IPv4 there is no duplicate address detection done by default for
static IPs (see CHECK_DUPLICATE_IP & SEND_GRATUITOUS_ARP variables in
network/config), but dhcpcd makes it (should be the "checking .. is
available on attached networks" msg).
Comment 29 Frederic Crozat 2013-06-05 08:25:12 UTC
it might be worth shipping a .service for the "ifup" way of network, which can then tell systemd to accept additional error code (SuccessExitStatus,RestartPreventExitStatus,  see man systemd.service)
Comment 30 Bernhard Wiedemann 2013-06-06 13:24:15 UTC
(In reply to comment #28)
> For IPv6 it is ON by default and when there is a dup detected on the
> link local address, IPv6 is basically not usable on this interface
> (multicasts, ... also a dhcpv6 client which requires it will fail).
> Further, there is no check which address fails or if other services
> are using (binding it) or not. Either everything worked or not.
> As the failure happens on a mandatory interface, rcnetwork fails.
> 
> You can a) disable dad via ifsysctl file, b) disable IPv6 (per interface).

IPv6 dad is no more failing, as I workarounded this on the openstack side.
So rcnetwork status always shows as green, but the udev bug can still occur.
Also eth0 is set to DHCP4 so no dhcpv6 client should run.


> (In reply to comment #25)
> > 2. if there is a duplicate MAC address - 
> >    shouldn't it then also refuse to do IPv4 networking?
> >    I guess our duplicate address detection code is less picky there.
> >    (admitted: IPv4 does more things in userspace than IPv6)
> 
> For IPv4 there is no duplicate address detection done by default for
> static IPs (see CHECK_DUPLICATE_IP & SEND_GRATUITOUS_ARP variables in
> network/config), but dhcpcd makes it (should be the "checking .. is
> available on attached networks" msg).

Does dhcpcd react to receiving its own gratuitous ARP packet? How?

What is the difference in meaning of "active (running)" and "active (exited)" systemd/network states?
Comment 31 Bernhard Wiedemann 2013-06-06 14:36:26 UTC
Created attachment 543109 [details]
journal with systemd debug

This happens completely independent of Cloud and hairpin_mode, too.

Steps to reproduce:
cd /dev/shm
wget http://openqa.opensuse.org/openqa/img/12.3-mini64.qcow2 # 369MB
qemu-kvm -drive if=virtio,file=12.3-mini64.qcow2

root
linux
rcnetwork status # will show "active (exited)"
zypper -n in -f udev
ifconfig # will show eth0 without inet addr


Could it be that multiple ifup eth0 processes are spawned?
When running this with -no-kvm option, bootup was so slow
that rcnetwork status showed 3 different PIDs in lines of
/bin/bash /sbin/ifup eth0 -o rc onboot

and even an ifdown in between?
Comment 32 Marius Tomaschewski 2013-06-07 04:32:19 UTC
(In reply to comment #30)
> (In reply to comment #28)
> > For IPv6 it is ON by default and when there is a dup detected on the
> > link local address, IPv6 is basically not usable on this interface
> > (multicasts, ... also a dhcpv6 client which requires it will fail).
> > Further, there is no check which address fails or if other services
> > are using (binding it) or not. Either everything worked or not.
> > As the failure happens on a mandatory interface, rcnetwork fails.
> > 
> > You can a) disable dad via ifsysctl file, b) disable IPv6 (per interface).
> 
> IPv6 dad is no more failing, as I workarounded this on the openstack side.

OK.

> So rcnetwork status always shows as green, but the udev bug can still occur.

Only when some error occurs and duplicate addresses are definitely errors.
The status isn't checked per address, but per interface and rcnetwork
reports failure when there is an error on mandatory interfaces.
You can use STARTMODE=hotplug or ifplugd for all "nice to have" interfaces.

> Also eth0 is set to DHCP4 so no dhcpv6 client should run.

When BOOTPROTO=dhcp4 is set, dhcp6 will be not started/used, but this does
not mean IPv6 is disabled -- it is enabled by default (static and autoconf).
You can disable a) DAD or b) IPv6 at all via ifsysctl or set IPV6_DAD_WAIT="0"
to disable it the check (each one per-interface or globally).

> > (In reply to comment #25)
> > > 2. if there is a duplicate MAC address - 
> > >    shouldn't it then also refuse to do IPv4 networking?
> > >    I guess our duplicate address detection code is less picky there.
> > >    (admitted: IPv4 does more things in userspace than IPv6)
> > 
> > For IPv4 there is no duplicate address detection done by default for
> > static IPs (see CHECK_DUPLICATE_IP & SEND_GRATUITOUS_ARP variables in
> > network/config), but dhcpcd makes it (should be the "checking .. is
> > available on attached networks" msg).
> 
> Does dhcpcd react to receiving its own gratuitous ARP packet? How?

I don't know/remember what it exactly does here.

> What is the difference in meaning of "active (running)" and "active (exited)"
> systemd/network states?

/etc/init.d/network is using "X-Systemd-RemainAfterExit: true" LSB tag:
  RemainAfterExit=
    Takes a boolean value that specifies whether the service shall be
    considered active even when all its processes exited. Defaults to no.

"active (exited)" means the network.service cgroup is empty. You'll see
this in static configurations (systemctl status network.service).
"active (running)" means, there is some process in the cgroup, that is
in e.g. dhcp mode where dhcp client is running.

Are you using dhcp and get "active (exited)"? Did the dhcpcd die?
Comment 33 Marius Tomaschewski 2013-06-07 04:51:37 UTC
(In reply to comment #31)
> Steps to reproduce:
> cd /dev/shm
> wget http://openqa.opensuse.org/openqa/img/12.3-mini64.qcow2 # 369MB
> qemu-kvm -drive if=virtio,file=12.3-mini64.qcow2

Going to fetch it...

> root
> linux
> rcnetwork status # will show "active (exited)"

No dhcpcd in the cgroup?!

> zypper -n in -f udev
> ifconfig # will show eth0 without inet addr

The problem seems to be here:

Jun 06 15:58:36 bwiedemann-12 systemd[1]: Received SIGCHLD from PID 2204 (ifstatus-route).
Jun 06 15:58:36 bwiedemann-12 systemd[1]: Got SIGCHLD for process 2204 (ifstatus-route)
Jun 06 15:58:36 bwiedemann-12 systemd[1]: Child 2204 died (code=exited, status=1/FAILURE)

> Could it be that multiple ifup eth0 processes are spawned?
> When running this with -no-kvm option, bootup was so slow
> that rcnetwork status showed 3 different PIDs in lines of
> /bin/bash /sbin/ifup eth0 -o rc onboot
> 
> and even an ifdown in between?

ifdown is usually from "ifdown eth0 -o dhcp" -- started for dhcp post
processing.
Comment 34 Marius Tomaschewski 2013-06-07 05:46:30 UTC
(In reply to comment #32)
> > What is the difference in meaning of "active (running)" and "active (exited)"
> > systemd/network states?
>
> "active (exited)" means the network.service cgroup is empty. You'll see
> this in static configurations (systemctl status network.service).
> "active (running)" means, there is some process in the cgroup, that is
> in e.g. dhcp mode where dhcp client is running.

Wrong.
"active (exited)" means, the /etc/init.d/network (network.service) is done;
it exited with the code visible in the status.
"active (running)" means, the /etc/init.d/network (network.service) is still
running.
Comment 35 Marius Tomaschewski 2013-06-07 06:15:56 UTC
Created attachment 543261 [details]
snapshot3

(In reply to comment #32)
> > IPv6 dad is no more failing, as I workarounded this on the openstack side.

And how?

For me, everything works fine using your VM on my host / network. Network
works fine and systemd does not trigger restart while "zypper in -f udev":

08:02:09.641545-04:00 bwiedemann-12 systemd[1]: Stopping udev Control Socket.
08:02:09.641869-04:00 bwiedemann-12 systemd[1]: Closed udev Control Socket.
08:02:09.642369-04:00 bwiedemann-12 systemd[1]: Stopping udev Kernel Socket.
08:02:09.642600-04:00 bwiedemann-12 systemd[1]: Closed udev Kernel Socket.
08:02:09.643468-04:00 bwiedemann-12 systemd[1]: Stopping udev Kernel Device Manager...
08:02:09.644205-04:00 bwiedemann-12 systemd[1]: Stopped udev Kernel Device Manager.
08:02:09.644598-04:00 bwiedemann-12 systemd[1]: Stopped udev Kernel Device Manager.
08:02:09.747782-04:00 bwiedemann-12 systemd[1]: Reloading.
08:02:09.762553-04:00 bwiedemann-12 systemd[1]: Starting udev Kernel Socket.
08:02:09.762880-04:00 bwiedemann-12 systemd[1]: Listening on udev Kernel Socket.
08:02:09.763129-04:00 bwiedemann-12 systemd[1]: Starting udev Control Socket.
08:02:09.763413-04:00 bwiedemann-12 systemd[1]: Listening on udev Control Socket.
08:02:09.763656-04:00 bwiedemann-12 systemd[1]: Starting udev Kernel Device Manager...
08:02:09.771822-04:00 bwiedemann-12 systemd[1]: Started udev Kernel Device Manager.
08:02:10.067863-04:00 bwiedemann-12 kernel: [   55.352623] device-mapper: uevent: version 
08:02:10.067873-04:00 bwiedemann-12 kernel: [   55.352847] device-mapper: ioctl: 4.23.0-io
08:02:12.343079-04:00 bwiedemann-12 cloud-init[1799]: 2013-06-07 08:02:12,342 - util.py[WA
08:02:15.243755-04:00 bwiedemann-12 kernel: [   60.528958] SGI XFS with ACLs, security att
08:02:15.247743-04:00 bwiedemann-12 kernel: [   60.532965] JFS: nTxBlock = 3928, nTxLock =
08:02:15.261746-04:00 bwiedemann-12 kernel: [   60.546700] QNX4 filesystem 0.2.3 registere
08:02:15.274808-04:00 bwiedemann-12 kernel: [   60.559710] Btrfs loaded
08:02:15.277978-04:00 bwiedemann-12 systemd[1]: Mounted FUSE Control File System.
08:02:15.278746-04:00 bwiedemann-12 kernel: [   60.563293] fuse init (API version 7.20)
08:02:15.299619-04:00 bwiedemann-12 os-prober: debug: /dev/vda1: is active swap

Please set DEBUG=EXTRA in /etc/sysconfig/network/config and provide the
output: "tar cvjpf /tmp/network-debug.tar.bz2 /dev/.sysconfig/network"
Comment 36 Marius Tomaschewski 2013-06-07 06:35:29 UTC
(In reply to comment #29)
> it might be worth shipping a .service for the "ifup" way of network,
> which can then tell systemd to accept additional error code
> (SuccessExitStatus,RestartPreventExitStatus,  see man systemd.service)

BTW: Basically the only possibility to avoid all this trouble
I see -- except of comment 29 -- is to use something like this:

--- a/scripts/network
+++ b/scripts/network
@@ -906,6 +906,8 @@ case "$ACTION" in
 
                reload_firewall
 
+               # do not report any errors to systemd
+               test "$SD_RUNNING" = "yes" && rc_reset
                ;;
 
        stop)

But is this really something we should use?!


Frederic, would it be possible to implement:
  X-Systemd-RestartPreventExitStatus: ...
  X-Systemd-SuccessExitStatus: ...

?
Comment 37 Bernhard Wiedemann 2013-06-07 06:40:49 UTC
Marius: how often did you try? did you run the disk from tmpfs? on a fast host?

on tmpfs I get the bug >90% of the time
but on HDD I only got it on 2 of 5 tries.

I think there is a race condition happening
and that depends on the speed of the system.

The three test systems on which I reproduced the bug use DDR3 RAM and 4 cores

if there was really an error in the networking, we still want it reported
Comment 38 Frederic Crozat 2013-06-07 06:44:55 UTC
(In reply to comment #36)

> Frederic, would it be possible to implement:
>   X-Systemd-RestartPreventExitStatus: ...
>   X-Systemd-SuccessExitStatus: ...
> 
> ?

No, it is a wrong idea. for systemd, restart is stop + start.

If we really need something, you should just create a network.service file in /usr/lib/systemd/system which will call /etc/init.d/network  in ExecStart / ExecStop / ExecReload.
Comment 39 Marius Tomaschewski 2013-06-07 07:06:24 UTC
(In reply to comment #37)
> Marius: how often did you try? did you run the disk from tmpfs? on a fast host?

6 times on a i7 box with image on SSD.

> I think there is a race condition happening
> and that depends on the speed of the system.

This could be. I'll retry on tmpfs.

Could you provide the DEBUG=EXTRA outputs please?
Comment 40 Marius Tomaschewski 2013-06-07 07:25:44 UTC
Bernhard,
with img on tmpfs I've got it once - dhcpcd were stopped after -- but
there were no EXTRA logs from network.. why?

Further, there are also other services which fail, e.g. boot.local.
Perhaps this is the reason... I'll retry next week.
Comment 41 Marius Tomaschewski 2013-06-07 07:26:45 UTC
(In reply to comment #39)
> Could you provide the DEBUG=EXTRA outputs please?

Together with debug logs from systemd please.
Comment 42 Marius Tomaschewski 2013-06-07 07:34:41 UTC
Created attachment 543274 [details]
mt-snapshot4

Hmm... from tmpfs all sort of things are failing.
Comment 43 Bernhard Wiedemann 2013-06-13 06:17:47 UTC
Marius, any news on this issue?
Can you reproduce it?
Comment 44 Dirk Mueller 2013-07-10 08:51:24 UTC
Hit the issue as well. It is certainly inconvenient having a remote server go down when installing updates.
Comment 45 Marius Tomaschewski 2013-07-10 13:18:00 UTC
Created attachment 547421 [details]
[PATCH] Disard /etc/init.d/network start errors under systemd

(In reply to comment #29)
> (SuccessExitStatus,RestartPreventExitStatus,  see man systemd.service)

It is IMO simplier to catch it in "rcnetwork start", where it is reported.
Comment 46 Marius Tomaschewski 2013-07-10 13:20:42 UTC
Bernhard, Dirk, does it work for you?

Package is in $OBS/home:mtomaschewski:branches:openSUSE:12.3:Update/sysconfig
Comment 47 Bernhard Wiedemann 2013-07-10 16:39:12 UTC
I installed
http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.3:/Update/openSUSE_12.3/x86_64/sysconfig-0.80.5-1.11.1.x86_64.rpm

and still could reproduce the problem as described in comment 31
with network status showing active (exited) and udev install killing network
Comment 48 Marius Tomaschewski 2013-07-11 10:54:54 UTC
Created attachment 547568 [details]
20130711-mt-test-1_3 output archive

Yes, the network sometimes has "active (exited)" and sometimes
"active (running)" state after booting -- although it reports 0
to systemd, dhcpcd is still running [so definitely not in exited]
in both cases and everything with it looks fine.

Anything else seems to depend on it. While udev update in exited
state, this causes to kill network processes while it migrates to
the new udev -- network scripts are AFAIS not involved in this.
Comment 49 Marius Tomaschewski 2013-07-11 11:03:49 UTC
Created attachment 547569 [details]
screenshot: network in active exited state, even it reported 0 and dhcpcd is running
Comment 50 Marius Tomaschewski 2013-07-11 11:07:57 UTC
Frederic,

why systemd shows the network.service in "active (exited)" state,
even rcnetwork start reported 0 and dhcpcd is still running in the
network.service cgroup? This looks like a systemd bug to me.
Comment 51 Frederic Crozat 2013-07-11 11:23:42 UTC
for me, systemctl (and systemd) are ok:

main process (ie /etc/init.d/network, on the Process line) was executed, finished and returned an exit code of 0 => Active (because /etc/init.d/network has X-Systemd-RemainAfterExit: true) and Exited because the pid referenced on the Process line is no longer running

there is still one process running, which was started by this process, and it is dhcpcd.
Comment 52 Robert Milasan 2013-07-11 11:27:25 UTC
Re-assigning the bug to Marius, don't really see this as a udev issue.

Testing now a different way to install udev (%post and %pre part), just to see if it makes a difference.
Comment 53 Robert Milasan 2013-07-11 11:36:10 UTC
OK, the idea didn't make a difference.
Comment 54 Marius Tomaschewski 2013-07-11 12:01:53 UTC
(In reply to comment #51)
> for me, systemctl (and systemd) are ok:
> 
> main process (ie /etc/init.d/network, on the Process line) was executed,
> finished and returned an exit code of 0 => Active (because /etc/init.d/network
> has X-Systemd-RemainAfterExit: true) and Exited because the pid referenced on
> the Process line is no longer running
> 
> there is still one process running, which was started by this process, and it
> is dhcpcd.

And why does it behaves like this only in 50% of the cases/boots [when VM
is on tmpfs of a fast host, VM on HDD in <10% of the cases]?
Usually it reports "active (running)" here. Also after "systemctl restart
network.service" after it initially booted in "active (exited)" state.

The situation is exactly same: rcnetwork exited with status 0, but there
is still dhcpcd running. Just the systemd state differs from time to time.

And why does systemd -- while udev update -- drops all state and kills all
cgroup members (dhcpcd) in "active (exited)" state [run1], but not in the
"active (running)" [run3], where it does not kill all network.service
cgroup members?

=> See run1 and run3 in the attachment of comment 49.

(In reply to comment #52)
> Re-assigning the bug to Marius, don't really see this as a udev issue.

It is not a network scripts bug. Network works, 0 status is reported.

Re-assigning to Frederic as systemd is not consistent / behaves strange.
Comment 55 Frederic Crozat 2013-07-11 14:31:11 UTC
The issue is mostly network initscript which can have two opposite behaviour:
- either be a initscript which have no process left running (static IP) 
- or some process running (dhcp). 

systemd can't handle easily those two opposite constraints (see manpage systemd.service)
- RemainAfterExit is supposed to tell if a service show stay "active" when no process is left running (issue here static vs dhcp). 
- GuessMainPID is supposed to search which PID is the "master one", when there is no PIDFile. For sysvinit scripts, systemd was disabling this feature (because it is not reliable) and considers all PID to be potentially "main PID"
Comment 56 Bernhard Wiedemann 2013-07-11 14:37:49 UTC
I dropped RemainAfterExit in /etc/init.d/network
in my testvm and the status still went to "active (exited)" on boot
with the associated breakage on udev upgrade

How can I enable GuessMainPID for the network script
to test if this makes a difference?
Comment 57 Frederic Crozat 2013-07-11 14:55:55 UTC
you need to create a network.service file which will call the initscript in ExecStart / ExecStop / ExecReload ..
Comment 58 Bernhard Wiedemann 2013-07-11 15:19:04 UTC
Created attachment 547604 [details]
network.service

with attached service file in /usr/lib/systemd/system/
and systemd enable network.service
network always came up correctly as "active (running)"
but rcnetwork stop would lead to failed state
maybe because dhcpcd exits on SIGTERM with return code 1
which is counted as error
Comment 59 Marius Tomaschewski 2013-07-12 13:20:28 UTC
(In reply to comment #55)

Hmm:
 - the systemd behavior is inconsistent.

   With dhcp: it _usually_ reports correctly "active (running)",
              _sometimes_ "active (exited)"

   You basically need a VM on a tmpfs of a fast host to get the
   "active (exited)".

   Looks to me, that the RemainAfterExit logic is not sufficient or
   there is some kind of race condition.
   To 'considers all PID to be potentially "main PID"' is IMO correct.

   Systemd should set the state consistently with RemainAfterExit:
   always "active (running)" when the main PID (of started init script)
   process exits and cgroup is not empty and when the cgroup becomes
   empty, change to "active (exited)".


   When the dhcp client gets an infinite lease, it exits. That is also
   with dhcp, you can get the "static setup" behavior. But it will not
   produce problem on update because dhcpd does not deconfigure ifaces.

 - Dropping RemainAfterExit would break static setups -> a no-go.

 - Providing a network.service lease has potential to break the service
   switch in yast2 -- but I didn't tested it.
Comment 60 Frederic Crozat 2013-07-12 13:27:32 UTC
(In reply to comment #59)
> (In reply to comment #55)
> 
> Hmm:
>  - the systemd behavior is inconsistent.
> 
>    With dhcp: it _usually_ reports correctly "active (running)",
>               _sometimes_ "active (exited)"
> 
>    You basically need a VM on a tmpfs of a fast host to get the
>    "active (exited)".
> 
>    Looks to me, that the RemainAfterExit logic is not sufficient or
>    there is some kind of race condition.
>    To 'considers all PID to be potentially "main PID"' is IMO correct.
> 
>    Systemd should set the state consistently with RemainAfterExit:
>    always "active (running)" when the main PID (of started init script)
>    process exits and cgroup is not empty and when the cgroup becomes
>    empty, change to "active (exited)".

What is the output of systemctl show network.service in both case ?

>  - Providing a network.service lease has potential to break the service
>    switch in yast2 -- but I didn't tested it.

I don't see how it would break service switch ?
Comment 61 Marius Tomaschewski 2013-07-12 14:17:44 UTC
Created attachment 547772 [details]
network.service (mt 1)

OK, this one seems to work -- at least with dhcp. I've switched between
NM and sysconfig several times in yast2, this seem to work too.

Unlike in attachment 547604 [details] I keep the KillMode=control-group as this
is what's needed I think. (I'm thinking of starting eth0 as hotplug
[at boot], eth1 as hotplug later, ...), where the remaining pid changes.
More tests (static, ...) in the next weeks.

[I guess, the state difference results from the GuessMainPID=yes default
in the service file. This seems to be required with RemainAfterExit=yes]
Comment 62 Marius Tomaschewski 2013-07-12 14:28:14 UTC
(In reply to comment #60)
> (In reply to comment #59)
> > (In reply to comment #55)
> What is the output of systemctl show network.service in both case ?

I'll provide it next week -- have to revert the changes first.

> >  - Providing a network.service lease has potential to break the service
> >    switch in yast2 -- but I didn't tested it.
> 
> I don't see how it would break service switch ?

I don't see it now, but the potential is there: I were thinking about the
2nd stage tricks and because it calls "disable NetworkManager.service"
without a following "enable network.service", but there is AFAIR also a 
"insserv network" to ensure lsb script is active, that will be forwarded
to systemd now. At the moment it looks fine -- seems to work.
Comment 64 Marius Tomaschewski 2013-07-17 21:30:47 UTC
Created attachment 548391 [details]
systemctl show network.service and dump in both states

(In reply to comment #60)
> What is the output of systemctl show network.service in both case ?

Attached in the archive. No difference except in timestamps and SubState.
Comment 65 Frederic Crozat 2013-07-19 14:38:38 UTC
Just wondering: is it possible to have any "long running" process being run with the udev network rules ? This is no longer possible with systemd / udev in 12.3 (RUN+= shouldn't have anything long running there, systemd unit should be used instead) and udev will kill any process remaining (like an dhcpcd instance) once the script from RUN+ is finished. Feel free to tell me I'm wrong :)

There is also something I don't understand:
when I look at the journal trace on attachment #3 [details] (with systemd.log_level=debug), it looks like dhcpcd PID and output which is left running is not the one which was initially started:

Jun 06 15:58:25 bwiedemann-12 dhcpcd[750]: eth0: exiting
vs
Jun 06 16:07:30 bwiedemann-12 dhcpcd[1387]: eth0: received SIGTERM, stopping

It looks like two dhcpcd daemons were started somehow, with one "blocking" the other.

It would be interesting to have, for the same problematic run (active(exited)), journal trace (journalctl -b, with booting with systemd.log_level=debug) and the network debug trace.

Another reminder is network.service (and this kind of initscript) is "abusing" type=forking, since there is no real "daemon" running, nor a PIDFile to help systemd to detect if there is really which PID is the main one. 

I'm also not 100% the issue is caused by active(exited) vs active(running), since upgrading udev package doesn't cause any change at systemd own configuration (no --daemon-reload nor --daemon-reexec in %scripts).
Comment 66 Frederic Crozat 2013-07-19 14:43:39 UTC
I've backported an upstream patch which fixes incorrect termination of service when some child terminates. I don't think it is related to this particular bug but it might be worth testing anyway ( http://download.opensuse.org/repositories/home:/fcrozat:/branches:/openSUSE:/12.3:/Update/standard/ ). Should be available in one hour..
Comment 67 Frederic Crozat 2013-07-19 15:30:43 UTC
I might have incorrectly understood one specific commit for our case :

commit 1835f23c2a53e632959270e79dbf3143874e6111
Author: Lennart Poettering <lennart@poettering.net>
Date:   Tue Oct 11 20:21:06 2011 +0200

    service: don't try to guess PID for SysV services anymore
    
    As it turns out there are quite a number of SysV services too broken to
    make the guessing work: instead of returning in the parent only after
    the child is fully initialized they return immediately. The effect is
    that the guessing in systemd might happen too early, at a time where the
    final main process doesn't exist yet.
    
    By turning this off we won't try to detect the main pid anymore, with
    the effect that all processes of the service in question are considered
    equally likely to be the main process.

This means systemd will pick one process as the "main" process, in the remaining processes and if this process terminated, it will cause the service to enter active(exited), even if there are other processes remaining in the cgroup. In our case, it might be because there are two dhcpcd processes running, one is picked by systemd, but it will terminated because another one is running and this would cause active(exited). 

However, I'm not sure this state is the reason why network goes down..
Comment 69 Marius Tomaschewski 2013-07-22 12:48:06 UTC
(In reply to comment #65)
> Just wondering: is it possible to have any "long running" process being run
> with the udev network rules ? This is no longer possible with systemd / udev in
> 12.3 (RUN+= shouldn't have anything long running there, systemd unit should be
> used instead) and udev will kill any process remaining (like an dhcpcd
> instance) once the script from RUN+ is finished. Feel free to tell me I'm wrong
> :)

Hmm.. The question is what is "long running" -- the complete boot needs just
few secs in both cases. The rule may start some background jobs. AFAIR, the
udev rule timeout were much higher.

I'll take a close look here... when udev kills the background processes that
were started via RUN, it could be a reason. But I currently don't believe it
is udev related.

> There is also something I don't understand:
> when I look at the journal trace on attachment #3 [details] (with
> systemd.log_level=debug), it looks like dhcpcd PID and output which is left
> running is not the one which was initially started:
> 
> Jun 06 15:58:25 bwiedemann-12 dhcpcd[750]: eth0: exiting
> vs
> Jun 06 16:07:30 bwiedemann-12 dhcpcd[1387]: eth0: received SIGTERM, stopping
> 
> It looks like two dhcpcd daemons were started somehow, with one "blocking"
> the other.

No, dhcpcd were started, after it got a lease it forks continuing in
background, while the parent exits [pid 750 above].

> It would be interesting to have, for the same problematic run (active(exited)),
> journal trace (journalctl -b, with booting with systemd.log_level=debug) and
> the network debug trace.

Yes, and perhaps also udev in debug mode too.

> Another reminder is network.service (and this kind of initscript) is "abusing"
> type=forking, since there is no real "daemon" running, nor a PIDFile to help
> systemd to detect if there is really which PID is the main one. 

Yes, it is a drawback is in systemd / RemainOnExit logic for proper systemV
script support :-)

> I'm also not 100% the issue is caused by active(exited) vs active(running),
> since upgrading udev package doesn't cause any change at systemd own
> configuration (no --daemon-reload nor --daemon-reexec in %scripts).

AFAIS, systemd/udev/migration hook kills all processes (using SIGTERM)
from the "exited" cgroup. Maybe this happens because it removes background
processes started by the RUN rule...?

Try to enable network tracing, did not show any output, that is there were
no "rcnework stop" involved.

(In reply to comment #67)
> However, I'm not sure this state is the reason why network goes down..

IMO it seems something we have to take a closer look at.
Comment 70 Frederic Crozat 2013-07-22 13:02:14 UTC
(In reply to comment #69)
> (In reply to comment #65)
> > Just wondering: is it possible to have any "long running" process being run
> > with the udev network rules ? This is no longer possible with systemd / udev in
> > 12.3 (RUN+= shouldn't have anything long running there, systemd unit should be
> > used instead) and udev will kill any process remaining (like an dhcpcd
> > instance) once the script from RUN+ is finished. Feel free to tell me I'm wrong
> > :)
> 
> Hmm.. The question is what is "long running" -- the complete boot needs just
> few secs in both cases. The rule may start some background jobs. AFAIR, the
> udev rule timeout were much higher.

Here is the exact explanation from upstream NEWS:
        * udev: when udevd is started by systemd, processes which are left
          behind by forking them off of udev rules, are unconditionally cleaned
          up and killed now after the event handling has finished. Services or
          daemons must be started as systemd services. Services can be
          pulled-in by udev to get started, but they can no longer be directly
          forked by udev rules.

> I'll take a close look here... when udev kills the background processes that
> were started via RUN, it could be a reason. But I currently don't believe it
> is udev related.

The issue is appearing when udev is being upgraded, not systemd, so "something" is going on when udev is being restarted, I think.

> > There is also something I don't understand:
> > when I look at the journal trace on attachment #3 [details] [details] (with
> > systemd.log_level=debug), it looks like dhcpcd PID and output which is left
> > running is not the one which was initially started:
> > 
> > Jun 06 15:58:25 bwiedemann-12 dhcpcd[750]: eth0: exiting
> > vs
> > Jun 06 16:07:30 bwiedemann-12 dhcpcd[1387]: eth0: received SIGTERM, stopping
> > 
> > It looks like two dhcpcd daemons were started somehow, with one "blocking"
> > the other.
> 
> No, dhcpcd were started, after it got a lease it forks continuing in
> background, while the parent exits [pid 750 above].

ok

> > I'm also not 100% the issue is caused by active(exited) vs active(running),
> > since upgrading udev package doesn't cause any change at systemd own
> > configuration (no --daemon-reload nor --daemon-reexec in %scripts).
> 
> AFAIS, systemd/udev/migration hook kills all processes (using SIGTERM)
> from the "exited" cgroup. Maybe this happens because it removes background
> processes started by the RUN rule...?

Hmm, after looking carefully at udev %pre/%post scripts, "systemctl daemon-reload" is being called. It would be interesting to see if just calling "systemctl daemon-reload" is causing the issue (I doubt it).
Comment 71 Bernhard Wiedemann 2013-07-22 13:39:49 UTC
just tested that "systemctl daemon-reload" does not kill dhcpcd
but zypper in -f udev does
Comment 72 Swamp Workflow Management 2013-08-02 14:46:20 UTC
Update released for: openstack-nova, openstack-nova-api, openstack-nova-cert, openstack-nova-compute, openstack-nova-network, openstack-nova-objectstore, openstack-nova-scheduler, openstack-nova-test, openstack-nova-vncproxy, openstack-nova-volume, python-nova
Products:
SUSE-CLOUD 1.0 (x86_64)
Comment 73 Bernhard Wiedemann 2013-09-10 14:00:15 UTC
This is an autogenerated message for OBS integration:
This bug (821879) was mentioned in
https://build.opensuse.org/request/show/198226 Factory / sysconfig
Comment 74 Robert Milasan 2014-06-06 08:09:15 UTC
Setting this to RESOLVED/FIXED.