Bug 976764

Summary: GDM does not restart
Product: [openSUSE] openSUSE Tumbleweed Reporter: Howard Guo <hguo>
Component: GNOMEAssignee: E-mail List <gnome-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: dheidler, dimstar, eich, fezhang, qzhao, tyang, vliaskovitis, yfjiang
Version: CurrentFlags: dimstar: needinfo? (tyang)
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 1018836    
Attachments: system journal

Description Howard Guo 2016-04-22 07:06:09 UTC
Created attachment 674137 [details]
system journal

This is a freshly installed tumbleweed 20160417, running in KVM, choosing gnome as desktop during installation.

Boot up the system, then login via GDM, and run "systemctl restart display-manager" via TTY or terminal application, then observe that GDM repeatedly attempts to restart, causing loss of all keyboard and mouse controls.

System journal is attached.

This issue can always be reproduced.
Comment 1 Felix Zhang 2016-04-22 08:42:35 UTC
Looks like upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=764665
Comment 2 Dominique Leuenberger 2016-07-25 07:52:10 UTC
WE might have a slightly different issue compared to the upstream one, as we do not use gdm.service, but rather display-manager.service, involving our own xdm scripts.

From what I've seen:
systemctl stop display-manager terminates the task /usr/bin/gdm, BUT leaves all children running

this causes problem when GDM is to be restarted, as the VT for gdm is already occupied - and the session actually still active

as a workaround: killproc -G -TERM /usr/lib/gdm/gdm-session-worker (note: this tears down all X sessions, but stopping display-manager should sort of have this effect anyway)
Comment 3 Dominique Leuenberger 2016-07-25 07:53:27 UTC
@CC Egbert - one of the xdm maintainers; maybe you have some great ideas for us?

I already tried chaning /usr/share/X11/display-manager to use -G to kill children of /usr/bin/gdm too, but that did not work in my local tests at least
Comment 4 Dominique Leuenberger 2016-07-25 07:56:33 UTC
alternatively:

loginctl kill-user gdm

to ensure all 'gdm owned' resources are gone - that keeps the user sessions running, which might not be exactly what is expected in all cases
Comment 5 Egbert Eich 2016-07-25 11:44:44 UTC
(In reply to Dominique Leuenberger from comment #3)
> @CC Egbert - one of the xdm maintainers; maybe you have some great ideas for
> us?
> 
> I already tried chaning /usr/share/X11/display-manager to use -G to kill
> children of /usr/bin/gdm too, but that did not work in my local tests at
> least

I assume you mean /usr/lib/X11/display-manager.
Looking at this file, gdm seems to be sent a SIGTERM - at least initially.
I would think that gdm cleans up all children (ie sends them SIGTERM as well) before going away itself. 
It could be that gdm is hung or is too slow shutting down everything - in this case, killproc will send a SIGKILL. 
The default delay is 5 sec. This should be sufficient - on the other hand, who knows?

I do not recommend to add DM specific code to /usr/lib/X11/display-manager.

Could someone who sees this do some more thorough debugging, please?

What happens when gdm is sent a SIGTERM 'by hand'? Do the children die?
Comment 6 Felix Zhang 2016-07-25 12:07:17 UTC
(In reply to Egbert Eich from comment #5)
> (In reply to Dominique Leuenberger from comment #3)
> > @CC Egbert - one of the xdm maintainers; maybe you have some great ideas for
> > us?
> > 
> > I already tried chaning /usr/share/X11/display-manager to use -G to kill
> > children of /usr/bin/gdm too, but that did not work in my local tests at
> > least
> 
> I assume you mean /usr/lib/X11/display-manager.
> Looking at this file, gdm seems to be sent a SIGTERM - at least initially.
> I would think that gdm cleans up all children (ie sends them SIGTERM as
> well) before going away itself. 
> It could be that gdm is hung or is too slow shutting down everything - in
> this case, killproc will send a SIGKILL. 
> The default delay is 5 sec. This should be sufficient - on the other hand,
> who knows?
> 
> I do not recommend to add DM specific code to /usr/lib/X11/display-manager.
> 
> Could someone who sees this do some more thorough debugging, please?
> 
> What happens when gdm is sent a SIGTERM 'by hand'? Do the children die?

Hi Egbert,

I tried with `killall -15 gdm`, the children are still alive. Which I think confirms your assumptions.
Comment 7 Egbert Eich 2016-07-25 12:28:51 UTC
(In reply to Felix Zhang from comment #6)

> I tried with `killall -15 gdm`, the children are still alive. Which I think
> confirms your assumptions.

Felix, IHMO, this used to work. Generally, a process should take care of taking down its children when it dies.
Comment 8 Felix Zhang 2016-07-25 16:33:29 UTC
(In reply to Egbert Eich from comment #7)
> (In reply to Felix Zhang from comment #6)
> 
> > I tried with `killall -15 gdm`, the children are still alive. Which I think
> > confirms your assumptions.
> 
> Felix, IHMO, this used to work. Generally, a process should take care of
> taking down its children when it dies.

Thanks Egbert. I checked gdm git repo and suspect commit 6b85869, introduced for bgo#755291. For gdm-session-worker It replaces calling g_main_loop_quit() with _exit(0). The latter terminates gdm-session-worker immediately but leaves all its children i.e. gdm-x-session and else inherited from pid 1.
Comment 9 Tao Yang 2017-01-22 08:49:01 UTC
(In reply to Felix Zhang from comment #1)
> Looks like upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=764665

I got a solution for this issue and update the upstream bug.

More details should be talked about for this issue.

Thanks
Comment 10 Dominique Leuenberger 2017-07-26 10:44:33 UTC
(In reply to Tao Yang from comment #9)
> (In reply to Felix Zhang from comment #1)
> > Looks like upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=764665
> 
> I got a solution for this issue and update the upstream bug.

If we have a solution (and upstream seems to be quite some time): can you add the solution to our packages already?
Comment 11 Cliff Zhao 2017-08-02 09:04:06 UTC
(In reply to Dominique Leuenberger from comment #10)
> (In reply to Tao Yang from comment #9)
> > (In reply to Felix Zhang from comment #1) 
> If we have a solution (and upstream seems to be quite some time): can you
> add the solution to our packages already?

Hi Dominique:
I feel maybe the modification below can fix this bug, but not sure.
What's your opinion?
https://git.gnome.org/browse/gdm/commit/?id=81f61eb
Comment 12 Dominique Leuenberger 2017-09-18 12:29:42 UTC
Looking at openQA, gdm seems now to be correctly shutting down and it is able to restart afterwards (since the upgrade to 3.26)

The test itself still fails, but that's because it makes 'wrong assumptions' on how GDM will behave (the mouse hover picks 'a user in the middle' and then 'key down' picks the next one instead of 'the 2nd in the list); visually, this seems correct)

As for this, I think the issue can be closed with it now

openQA test run: https://openqa.opensuse.org/tests/486763#step/multi_users_dm/12