Bug 841544

Summary: Long delay on shutdown stopping User Manager for 0
Product: [openSUSE] openSUSE Tumbleweed Reporter: Andrei Borzenkov <arvidjaar>
Component: BasesystemAssignee: systemd maintainers <systemd-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: crrodriguez, fbui, lmuelle, nfbrown, seife, thomas.blume, werner
Version: 13.1 Beta 1   
Target Milestone: ---   
Hardware: Other   
OS: openSUSE 13.1   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Andrei Borzenkov 2013-09-20 15:06:19 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0

In beta 1 every shutdown waits for timeout stopping user manager:

[  OK  ] Stopped LSB: Configure network interfaces and set up routing.
         Stopping SuSEfirewall2 phase 1...
[  OK  ] Stopped SuSEfirewall2 phase 1.
[     *] A stop job is running for User Manager for 0

This is VM in qemu KVM, minimal text only installation.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 Andrei Borzenkov 2013-09-21 06:12:00 UTC
Suggested fix in SR#199892 (patch from Olivier Brunel)
Comment 2 Bernhard Wiedemann 2013-09-24 08:00:10 UTC
This is an autogenerated message for OBS integration:
This bug (841544) was mentioned in
https://build.opensuse.org/request/show/200302 Factory / systemd
Comment 3 Frederic Crozat 2013-09-24 11:37:02 UTC
*** Bug 842003 has been marked as a duplicate of this bug. ***
Comment 4 Andrei Borzenkov 2013-09-26 11:10:20 UTC
Still not fixed. Apparently patch just changes relative timings. The real problem is, SIGRTMIN+24 gets either lost or ignored when user instance is requested to stop. Here is log in good case:

Sep 26 08:39:39 linux-1a7f systemd[1507]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Sep 26 08:39:39 linux-1a7f systemd[1507]: Failed to open private bus connection: Failed to connect to socket /run/user/0/dbus/user_bus_socket: No such file or directory
Sep 26 08:39:39 linux-1a7f systemd[1507]: Starting Default.
Sep 26 08:39:39 linux-1a7f systemd[1507]: Reached target Default.
Sep 26 08:39:39 linux-1a7f systemd[1507]: Startup finished in 37ms.
Sep 26 08:39:42 linux-1a7f systemd[1507]: Time has been changed
Sep 26 08:39:50 linux-1a7f systemd[1507]: Stopping Default.
Sep 26 08:39:50 linux-1a7f systemd[1507]: Stopped target Default.
Sep 26 08:39:50 linux-1a7f systemd[1507]: Starting Shutdown.
Sep 26 08:39:50 linux-1a7f systemd[1507]: Reached target Shutdown.
Sep 26 08:39:50 linux-1a7f systemd[1507]: Starting Exit the Session...

and in bad case

Sep 26 08:41:23 linux-1a7f systemd[1508]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Sep 26 08:41:23 linux-1a7f systemd[1508]: Failed to open private bus connection: Failed to connect to socket /run/user/0/dbus/user_bus_socket: No such file or directory
Sep 26 08:41:23 linux-1a7f systemd[1508]: Starting Default.
Sep 26 08:41:23 linux-1a7f systemd[1508]: Reached target Default.
Sep 26 08:41:23 linux-1a7f systemd[1508]: Startup finished in 38ms.
Sep 26 08:41:25 linux-1a7f systemd[1508]: Time has been changed
Sep 26 08:45:27 linux-1a7f systemd[1508]: Stopping Default.
Sep 26 08:45:27 linux-1a7f systemd[1508]: Stopped target Default.
Sep 26 08:45:27 linux-1a7f systemd[1508]: Starting Shutdown.
Sep 26 08:45:27 linux-1a7f systemd[1508]: Reached target Shutdown.
Sep 26 08:45:27 linux-1a7f systemd[1508]: Starting Exit the Session...
Sep 26 08:45:27 linux-1a7f systemd[1508]: Started Exit the Session.
Sep 26 08:45:27 linux-1a7f systemd[1508]: Starting Exit the Session.
Sep 26 08:45:27 linux-1a7f systemd[1508]: Reached target Exit the Session.

Service "Exit the Sessions" sends SIGRTMIN+24 to systemd which should cause it to stop. But in the second case it did not get or react to it, and continue to run.

Any attempt increase debug level makes this problem disappear.
Comment 5 Cristian Rodríguez 2013-10-01 16:19:57 UTC
patch has to be replaced by:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=7400b9d2e99938d17b281d7df43680eade18666e

upstream fix.
Comment 6 Frederic Crozat 2013-10-01 16:36:06 UTC
already in my branch, I'll submit it soon
Comment 7 Andrei Borzenkov 2013-10-03 03:06:36 UTC
(In reply to comment #4)
> The real
> problem is, SIGRTMIN+24 gets either lost or ignored

it is never received by systemd user instance. Confirmed by adding additional logging immediately after receiveing SIGRTMIN+24.
Comment 8 Andrei Borzenkov 2014-01-26 08:20:17 UTC
I managed to reproduce it with user instance debug level. /bin/kill -RTMIN+24 gets killed itself by SIGTERM. 

Jan 26 11:53:58 linux-1a7f systemd[1942]: Received SIGTERM from PID 1 (systemd).
Jan 26 11:53:58 linux-1a7f systemd[1942]: Activating special unit exit.target
Jan 26 11:53:58 linux-1a7f systemd[1942]: Trying to enqueue job exit.target/start/replace
Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job exit.target/start as 3
Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job systemd-exit.service/start as 4
Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job shutdown.target/start as 5
Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job default.target/stop as 7
Jan 26 11:53:58 linux-1a7f systemd[1942]: Enqueued job exit.target/start as 3
Jan 26 11:53:58 linux-1a7f systemd[1942]: Stopping Default.
Jan 26 11:53:58 linux-1a7f systemd[1942]: default.target changed active -> dead
Jan 26 11:53:58 linux-1a7f systemd[1942]: Job default.target/stop finished, result=done
Jan 26 11:53:58 linux-1a7f systemd[1942]: Stopped target Default.
Jan 26 11:53:58 linux-1a7f systemd[1942]: Starting Shutdown.
Jan 26 11:53:58 linux-1a7f systemd[1942]: shutdown.target changed dead -> active
Jan 26 11:53:58 linux-1a7f systemd[1942]: Job shutdown.target/start finished, result=done
Jan 26 11:53:59 linux-1a7f systemd[1942]: Reached target Shutdown.
Jan 26 11:53:59 linux-1a7f systemd[1942]: Starting Exit the Session...
Jan 26 11:53:59 linux-1a7f systemd[1942]: About to execute: /usr/bin/kill -s 58 $MANAGERPID
Jan 26 11:53:59 linux-1a7f systemd[1942]: Forked /usr/bin/kill as 1951
Jan 26 11:53:59 linux-1a7f systemd[1942]: systemd-exit.service changed dead -> start
Jan 26 11:53:59 linux-1a7f systemd[1942]: Set up jobs progress timerfd.
Jan 26 11:53:59 linux-1a7f systemd[1942]: Collecting default.target
Jan 26 11:53:59 linux-1a7f systemd[1942]: Received SIGCHLD from PID 1943 ((sd-pam)).
Jan 26 11:53:59 linux-1a7f systemd[1942]: Got SIGCHLD for process 1943 ((sd-pam))
Jan 26 11:53:59 linux-1a7f systemd[1942]: Child 1943 died (code=exited, status=0/SUCCESS)
Jan 26 11:53:59 linux-1a7f systemd[1942]: Received SIGCHLD from PID 1951 ((kill)).
Jan 26 11:53:59 linux-1a7f systemd[1942]: Got SIGCHLD for process 1951 ((kill))
Jan 26 11:53:59 linux-1a7f systemd[1942]: Child 1951 died (code=killed, status=15/TERM)
Jan 26 11:53:59 linux-1a7f systemd[1942]: Child 1951 belongs to systemd-exit.service
Jan 26 11:53:59 linux-1a7f systemd[1942]: systemd-exit.service: main process exited, code=killed, status=15/TERM
Jan 26 11:53:59 linux-1a7f systemd[1942]: systemd-exit.service changed start -> dead
Jan 26 11:53:59 linux-1a7f systemd[1942]: Job systemd-exit.service/start finished, result=done
Jan 26 11:53:59 linux-1a7f systemd[1942]: Started Exit the Session.
Jan 26 11:53:59 linux-1a7f systemd[1942]: Closed jobs progress timerfd.
Jan 26 11:53:59 linux-1a7f systemd[1942]: Starting Exit the Session.
Jan 26 11:53:59 linux-1a7f systemd[1942]: exit.target changed dead -> active
Jan 26 11:53:59 linux-1a7f systemd[1942]: Job exit.target/start finished, result=done
Jan 26 11:53:59 linux-1a7f systemd[1942]: Reached target Exit the Session.
Comment 9 Andrei Borzenkov 2014-01-26 15:02:46 UTC
The problem is, user@<UID>.service has KillMode=control-group which causes systemd to traverse cgroup and send SIGTERM to every process. So it happens to find /bin/kill in the wrong moment and kills it.

Even ignoring this problem it looks wrong. systemd user instance manages all started services itself. Each service started by user instance may have separate requirements how it should be stopped; sending every process SIGTERM before trying ExecStop for repective service is just wrong.

Unfortunately currently systemd enforces KillMode=control-group for services that register itself with PAM.
Comment 10 Andrei Borzenkov 2014-01-29 18:25:31 UTC
SR#215564 for Base:System with upstream fix (even though I do not quite like it). home:arvidjaar:bnc:841544:13.1 has fix for 13.1.
Comment 11 Dr. Werner Fink 2014-01-30 08:56:51 UTC
thanks a lot ... I've to resolve a conflict in systemd.changes to be able to add the content of your SR
Comment 12 Dr. Werner Fink 2014-01-30 09:02:56 UTC
See SR#215598
Comment 13 Bernhard Wiedemann 2014-01-30 10:00:38 UTC
This is an autogenerated message for OBS integration:
This bug (841544) was mentioned in
https://build.opensuse.org/request/show/215598 Factory / systemd
Comment 15 Franck Bui 2015-07-01 12:57:22 UTC
This still affects opensuse 13.1, should we backport the corresponding fixes ?
Comment 22 Swamp Workflow Management 2016-02-03 14:22:30 UTC
openSUSE-RU-2016:0320-1: An update that has 146 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 737690,742774,750845,818044,838475,841544,849870,852015,852021,852232,853293,854884,856389,856392,856858,857204,858864,859072,859365,860574,860937,861316,861489,863217,864745,864904,865834,866732,866933,867128,867663,867664,867840,868019,868230,868439,868931,869142,869603,872929,873432,873444,874665,875502,876587,876694,877021,877674,878525,880438,880732,881125,881559,881942,882393,882714,883565,884271,884403,885232,885288,886211,886599,886852,888178,888215,888612,889297,889357,890977,892096,892162,892300,893797,895087,896664,897799,897801,897803,898233,898240,898432,900558,901481,902240,902901,903009,903963,904214,904517,904828,905550,906709,906900,907318,907393,908476,909358,910643,911347,912030,912334,913517,916420,918118,919095,920195,921831,921898,921920,926169,927250,927457,928265,931388,932284,933365,933512,933521,933533,934077,934901,937512,937900,938908,939571,940264,941576,944132,944799,945282,947212,948458,948555,948705,949574,949683,949739,950510,951265,951663,953241,954336,954781,955635,961576
CVE References: 
Sources used:
openSUSE 13.1 (src):    systemd-210-40.1, systemd-mini-210-40.1
Comment 23 Franck Bui 2016-02-08 08:51:36 UTC
This issue should be fixed by the latest update release of systemd. Feel free to reopen it if you don't think it's the case.

Closing.