|
Bugzilla – Full Text Bug Listing |
| Summary: | network is started then stopped right away | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.1 | Reporter: | Forgotten User cI3FYhfYTs <forgotten_cI3FYhfYTs> |
| Component: | Basesystem | Assignee: | Marius Tomaschewski <mt> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | fcrozat, info, mantel, suse-beta |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
systemd debugging output for the ssh/network problem
dmesg output of failure after updating to test version dmesg output of failure after updating to fcrozat's systemd dmesg output not that rpcbind is not started anymore |
||
|
Description
Forgotten User cI3FYhfYTs
2011-12-19 13:08:25 UTC
Created attachment 468065 [details]
systemd debugging output for the ssh/network problem
I got a similiar problem. I have to up eth0 manually by "ifup eth0". It runs until reboot. I saw that postfix was not running but showing "dead". The log (/var/log/message) shows postmap complaining about a missing local interface ::1. I had deactivated IPv6 on setup due to no need. After that problem, I reactivated it and bingo. The rcnetwork or restart does start eth0. Could it be, that there is a wrong dependency in the network start script forcing to shut it off when there is no IPv6? I wish, I have the opposite problem, it's a v6 only network, so I have no need for IPv4. But I doubt there's an option to turn it off and then on again, so this won't work for me. could you test with updated systemd and sysconfig from 12.1 maintenance update ? Issue regarding network service going down just after being up should be fixed. There is no update available. Whats the version number supposed to be? I've got systemd-37-3.6.1 and the bug is still in there. my bad, I thought sysconfig update had been pushed. Please test with sysconfig package from https://build.opensuse.org/package/show?package=sysconfig&project=openSUSE%3A12.1%3AUpdate%3ATest There must be something wrong with the link, I don't seem to be able to add the repo to my zypper repo list. because it wasn't a url for adding a repository. Try http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.1:/Update:/Test/openSUSE_12.1/ Yep, looks like it's fixed. I've rebooted 3 times now and the network stays up as expected. marking closed as fixed, will be handled by pending maintenance update for 12.1 Created attachment 474575 [details]
dmesg output of failure after updating to test version
Spoke too soon. On the 5th reboot it shows the original (erroneous) behaviour again, attached is the dmesg.
Don't know if it's good news or bad news, out of 10 reboots 2 succeeded, 8 failed, and I don't know what the schema behind that is; as I was saying the good news is that it's not always reproducible. *** Bug 741588 has been marked as a duplicate of this bug. *** We've added the "X-Systemd-RemainAfterExit: true" support to systemd and added the flag to the LSB tags in network script to keep services, that do not have any daemon, in "running" state instead to assume that they're gone when there is no process after start. [systemd update is released, sysconfig still in comment 8 repo] But it seems, there is still some check for empty cgroup somewhere in systemd causing systemd to execute the stop action for the service: [ 19.199283] systemd[1]: Accepted connection on private bus. [ 19.200140] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent [ 19.200235] systemd[1]: network.service: cgroup is empty [ 19.200344] systemd[1]: About to execute: /etc/init.d/network stop [ 19.205247] systemd[1]: Forked /etc/init.d/network as 1258 [ 19.205481] systemd[1]: network.service changed running -> stop See also deadlock problem in bug 724777 comment 4. Further, AFAIS systemd never calls the "network status" action but always the cached "start" return code only [that may have reported "in background" when dhcp didn't get any lease yet, ipv6 dad is not finished yet, ...]. And because "status" isn't always correct, another action results may be wrong too (e.g. "try-restart"). I were unable to catch the problem until now, because it works for me as soon as I enable debugging :-/ what is usually a timing/locking problem. (In reply to comment #11) > Created an attachment (id=474575) [details] > dmesg output of failure after updating to test version > > Spoke too soon. On the 5th reboot it shows the original (erroneous) behaviour > again, attached is the dmesg. Look at the wrong start order -- systemd also starts most network depending services incl. remote-fs before network: [ 7.024754] systemd[1]: Installed new job remote-fs.target/start as 65 [ 7.024757] systemd[1]: Installed new job syslog.service/start as 66 [ 7.024760] systemd[1]: Installed new job nfs-secure.service/start as 69 [ 7.024764] systemd[1]: Installed new job var-lib-nfs-rpc_pipefs.mount/start as 70 [ 7.024767] systemd[1]: Installed new job rpcbind.service/start as 71 ... [ 7.024833] systemd[1]: Installed new job network.service/start as 88 [ 7.024836] systemd[1]: Installed new job network.target/start as 89 I don't think I can fix (work around) here something in the network script. See also bug 744293 -- does the systemd dependency fix helps in your case? Perhaps network simply gets stopped because of a try to resolve the deps? Created attachment 474756 [details]
dmesg output of failure after updating to fcrozat's systemd
Not entirely sure what you mean by `systemd dependency fix'. I took it you wanted me to try fcrozat's latest systemd version and I did. Attached is the result.
Yes, I were speaking about systemd with a remote-fs dependency fix from: http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ Hmm... there still seems to be a deps issue. Diverse remotefs related service are started before network: [ 20.803349] systemd[1]: Got SIGCHLD for process 1216 (ifup-route) [ 20.803468] systemd[1]: Child 1216 died (code=exited, status=0/SUCCESS) [ 20.804130] systemd[1]: Received SIGCHLD from PID 613 (network). [ 20.804208] systemd[1]: Got SIGCHLD for process 613 (network) [ 20.804320] systemd[1]: Child 613 died (code=exited, status=0/SUCCESS) [ 20.804325] systemd[1]: Child 613 belongs to network.service [ 20.804373] systemd[1]: network.service: control process exited, code=exited status=0 [ 20.804377] systemd[1]: network.service got final SIGCHLD for state start [ 20.804542] systemd[1]: network.service changed start -> running [ 20.804559] systemd[1]: Job network.service/start finished, result=done [ 20.804708] systemd[1]: network.target changed dead -> active [ 20.804716] systemd[1]: Job network.target/start finished, result=done [ 20.804784] systemd[1]: About to execute: /etc/init.d/nfs start [ 20.809192] systemd[1]: Forked /etc/init.d/nfs as 1219 [ 20.809359] systemd[1]: nfs.service changed dead -> start [ 20.815299] systemd[1]: Received SIGCHLD from PID 1217 (ifup-route). [ 20.815383] systemd[1]: Got SIGCHLD for process 1217 (ifup-route) [ 20.815498] systemd[1]: Child 1217 died (code=exited, status=1/FAILURE) [ 20.824242] systemd[1]: Accepted connection on private bus. [ 20.829008] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent [ 20.829190] systemd[1]: network.service: cgroup is empty [ 20.829269] systemd[1]: About to execute: /etc/init.d/network stop [ 20.832227] systemd[1]: Forked /etc/init.d/network as 1221 [ 20.832473] systemd[1]: network.service changed running -> stop Interesting thing is PID 1217 (ifup-route) after network finished... where is this from ... just logged late? Please make sure you have the most recent systemd >= 37-3.151.1 from: http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ install the sysconfig package (changelog from today, > 0.75.4-2.8) from: http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.1:/Update:/Test/openSUSE_12.1/ ------------------------------------------------------------------ Thu Feb 16 14:03:59 UTC 2012 - mt@suse.com - Test patch: 0010-network-start-fails-under-systemd.test.patch bnc#734676: network bonding does not work when used with systemd bnc#737636: network is started then stopped right away bnc#745859: network reports failure at start under systemd Set DEBUG=EXTRA, reboot and attach tar of /dev/.sysconfig/network. ------------------------------------------------------------------ Then set DEBUG=EXTRA in /etc/sysconfig/network/config, reboot and attach a tar.bz2 of /dev/.sysconfig/network when it occurs again. Ok, the network seems to start reliably now. However, rpcbind is now not started anymore, or to be precise, started too soon (before any networking). Shall I file a new bug report? hmm, since there is a dependency from rpcbind on $network, it would mean network.target has been activated, ie both NM-wait-network.service and network.service are activated. network service might be giving its status too soon. could you test with "systemd.log_level=debug systemd.log_target=kmsg" and attach dmesg output after booting, so see what happened exactly. Created attachment 476707 [details]
dmesg output not that rpcbind is not started anymore
I don't see rpcbind being started at startup. what is the output of systemctl status rpcbind.service ? root@kegel:~ # systemctl status rpcbind.service
rpcbind.service - RPC Bind
Loaded: loaded (/lib/systemd/system/rpcbind.service; disabled)
Active: active (running) since Fri, 17 Feb 2012 14:57:44 +0000; 1h 25min ago
Main PID: 1510 (rpcbind)
CGroup: name=systemd:/system/rpcbind.service
└ 1510 /sbin/rpcbind -w -f
rpcbind.service is not enabled. Try "systemctl enable rpcbind.service" it should enable it for next reboot. but I don't understand, rpcbind doesn't ship with a .service on 12.1. Are you running on Factory ? I think you should open a separate bug report for rpcbind on Factory. Indeed it does. Ok, I'm happy now. No, it's a 12.1 but it has your systemd and mt's sysconfig, so I don't know where the service comes from. check rpm -qi $(rpm -qf /lib/systemd/system/rpcbind.service) I doubt it come from openSUSE:12.1 repository.. Oh indeed, it's a leftover from my autofs bug report. Vendor: obs://build.opensuse.org/home:elvigia It even works better than before now, as the rpcbind process was originally started before the network was up, and there were error messages about not being able to bind a port because the address did not exist. Do you mean network issue is fixed for you ? Yes. OK, then let's resolve it as a duplicate of bnc#744293 where the dependency issue has been fixed. *** This bug has been marked as a duplicate of bug 744293 *** Many thanks to everybody involved for reporting, all the tests and patience! I disagree with the resolution, the bug is fixed as is. There has been no dependency issue at all. Like I said, the dependency problem was there BEFORE, probably because the rpcbind service file didn't `integrate' well. After ENABLING it (it was disabled according to fcrozat) it's started as designed. Well, I've closed it as DUP e.g. because of comment 26 where you write "rpcbind process was originally started before the network was up" => the remote-fs-after-network dependency problem: - Add remote-fs-after-network.patch and update insserv patch: ensure remote-fs-pre.target is enabled and started before network mount points (bnc#744293). But anyway, I'm fine with FIXED as well :-) This is an autogenerated message for OBS integration: This bug (737636) was mentioned in https://build.opensuse.org/request/show/107529 12.1 / sysconfig |