Bug 787228

Summary: systemd: kexec is stuck after network shutdown
Product: [openSUSE] openSUSE 12.2 Reporter: Jiri Slaby <jslaby>
Component: BasesystemAssignee: Frederic Crozat <fcrozat>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: forgotten_xRcrmyYBVX
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: screen when stuck
debug log of hung reboot

Description Jiri Slaby 2012-10-29 18:41:38 UTC
When I try to kexec to a new kernel, most of time, systemd gets stuck waiting for something infinitely. It usually happens after vpn and network is shut down.

Nothing happens if I do sysrq-e, sysrq-i (send TERM and KILL to everybody). I can only press ctrl-alt-del. But then system will reboot instead of kexec.
Comment 1 Frederic Crozat 2012-11-26 13:08:21 UTC
I could be interesting to boot with systemd.log_level=debug systemd.log_target=console to see what is going on.

I can't reproduce the issue, even after enabling VPN..
Comment 2 Jiri Slaby 2012-12-07 14:03:10 UTC
(In reply to comment #1)
> I could be interesting to boot with systemd.log_level=debug
> systemd.log_target=console to see what is going on.
> 
> I can't reproduce the issue, even after enabling VPN..

It looks like not bound to VPN. It spits out that there is a dependency problem with some service.

Should
kill -54 1
kill -59 1
do the job above at runtime? As it doesn't seem to have any effect?
Comment 3 Frederic Crozat 2012-12-07 14:13:56 UTC
hmm, according to documentation and if I still know how to count:
debug => kill -56 1 
console output => kill -61 1
Comment 4 Jiri Slaby 2012-12-07 18:27:48 UTC
(In reply to comment #3)
> hmm, according to documentation and if I still know how to count:
> debug => kill -56 1 

SIGRTMIN+22

/usr/include/asm/signal.h:
...
#define SIGRTMIN        32

32+22=54

> console output => kill -61 1

SIGRTMIN+27

27+32=59

Right?
Comment 5 Jiri Slaby 2012-12-07 18:37:24 UTC
(In reply to comment #4)
> Right?

Nope, got it now.
Comment 6 Jiri Slaby 2013-01-09 11:53:08 UTC
This happens also with poweroff. However I never recall to enable debug+console before kexec/poweroff. Anyway, what I see now is ntpd cannot be stopped for some reason (Stopping ntpd .. [FAILED]) and systemd says it is a poweroff dependency failure. Then network is shut down and it blocks. I'll try to remeber to enable debug+console next time.
Comment 7 Jiri Slaby 2013-01-12 20:19:50 UTC
Created attachment 520032 [details]
screen when stuck

This is how it looks. ntpd looks suspicious there...
Comment 8 Jiri Slaby 2013-02-04 15:37:26 UTC
(In reply to comment #7)
> This is how it looks. ntpd looks suspicious there...

It's not ntpd. When I disable it the issue still occurs. It happens only when network connection is activated. Then any of kexec, poweroff and reboot gets stuck.
Comment 9 Jiri Slaby 2013-02-04 15:38:26 UTC
Forgot to add that there is a plenty of messages when debug is enabled and nothing relevant in there. What should I be looking for?
Comment 10 Frederic Crozat 2013-02-04 16:18:45 UTC
could you try to do the procedure described in http://freedesktop.org/wiki/Software/systemd/Debugging#Diagnosing_Shutdown_Problems to get a full trace of what is going on ?
Comment 11 Jiri Slaby 2013-02-04 16:22:29 UTC
(In reply to comment #10)
> could you try to do the procedure described in
> http://freedesktop.org/wiki/Software/systemd/Debugging#Diagnosing_Shutdown_Problems
> to get a full trace of what is going on ?

Ok.

Regarding the first step in there, CTRL+ALT+DEL forces reboot to proceed even if it was stuck.
Comment 12 Jiri Slaby 2013-02-04 16:54:34 UTC
Created attachment 523233 [details]
debug log of hung reboot

(In reply to comment #10)
> to get a full trace of what is going on ?

Here you go.
Comment 13 Frederic Crozat 2013-02-04 17:16:03 UTC
in the attached trace, did you cancel the shutdown or anything like that ?

There is something suspiscious :

[49911.355122] systemd[1]: sys-devices-virtual-net-tun0.device changed plugged -> dead
[49911.362057] systemd[1]: Accepted connection on private bus.
[49911.362625] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.RestartUnit() on /org/freedesktop/systemd1
[49911.362652] systemd[1]: Trying to enqueue job dnsmasq.service/restart/replace
[49911.362999] systemd[1]: Installed new job dnsmasq.service/restart as 1925
[49911.363007] systemd[1]: Job dbus.socket/stop finished, result=canceled
[49911.363016] systemd[1]: Installed new job dbus.socket/start as 1928
[49911.363022] systemd[1]: Job sysinit.target/stop finished, result=canceled
[49911.363031] systemd[1]: Installed new job sysinit.target/start as 1929
[49911.363037] systemd[1]: Job local-fs.target/stop finished, result=canceled
[49911.363043] systemd[1]: Installed new job local-fs.target/start as 1930
[49911.363048] systemd[1]: Job boot-efi.mount/stop finished, result=canceled
[49911.363054] systemd[1]: Installed new job boot-efi.mount/start as 1931
[49911.363060] systemd[1]: Installed new job fsck@dev-disk-by\x2did-ata\x2dINTEL_SSDSA2M080G2GC_CVPO0175040N080JGN\x2dpart1.service/start as 1932
[49911.363066] systemd[1]: Job umount.target/start finished, result=canceled
[49911.363074] systemd[1]: Job reboot.service/start finished, result=dependency
[49911.363390] systemd[1]: Job reboot.target/start finished, result=dependency
[49911.363399] systemd[1]: Job reboot.target/start failed with result 'dependency'.
[49911.363404] systemd[1]: Job reboot.service/start failed with result 'dependency'.

It looks like turning off the vpn is restarting dnsmasq (which was already off), which is restarting a number of services.

Could you try replacing "/etc/init.d/dnsmasq restart" by "/etc/init.d/dnsmasq try-restart" in  /etc/openvpn/client.down ?
Comment 14 Jiri Slaby 2013-02-04 21:05:24 UTC
(In reply to comment #13)
> in the attached trace, did you cancel the shutdown or anything like that ?

No, no, it is what it does w/o my intervention.

> It looks like turning off the vpn is restarting dnsmasq (which was already
> off), which is restarting a number of services.
> 
> Could you try replacing "/etc/init.d/dnsmasq restart" by "/etc/init.d/dnsmasq
> try-restart" in  /etc/openvpn/client.down ?

Yeah, that fixed it. Should we update this document:
https://wiki.innerweb.novell.com/index.php/Services_Team/Policies/openVPN/client_setup
?
Comment 15 Frederic Crozat 2013-02-05 08:33:49 UTC
(In reply to comment #14)
> (In reply to comment #13)

> > It looks like turning off the vpn is restarting dnsmasq (which was already
> > off), which is restarting a number of services.
> > 
> > Could you try replacing "/etc/init.d/dnsmasq restart" by "/etc/init.d/dnsmasq
> > try-restart" in  /etc/openvpn/client.down ?
> 
> Yeah, that fixed it. Should we update this document:
> https://wiki.innerweb.novell.com/index.php/Services_Team/Policies/openVPN/client_setup
> ?

Done.

closing as "fixed"
Comment 16 Jiri Slaby 2013-02-05 08:37:06 UTC
Neat, so I can reboot after a half year :). Thanks.
Comment 17 Frederic Crozat 2013-02-22 15:51:19 UTC
just for the record, upstream has just fixed this issue properly, by creating transactions which can't be cancelled automatically (only with a command), for stuff like reboot, shutdown : a service being started at shutdown would no longer stop reboot transaction..
Comment 18 Jiri Slaby 2013-04-02 15:05:48 UTC
*** Bug 812541 has been marked as a duplicate of this bug. ***
Comment 19 Forgotten User xRcrmyYBVX 2013-04-12 14:41:54 UTC
Just a quick question: will there be or has there been update for 12.3 including this fix? I have just had a situation where this bug prevented a reboot on ~30 machines :-(
Comment 20 Frederic Crozat 2013-04-12 14:54:00 UTC
(In reply to comment #19)
> Just a quick question: will there be or has there been update for 12.3
> including this fix? I have just had a situation where this bug prevented a
> reboot on ~30 machines :-(

No, it can't be backported. You have to find which service is being started at shutdown and prevent that.