Bug 306964

Summary: python-gtk hangs when window.destroy() is called from thread
Product: [openSUSE] openSUSE 10.3 Reporter: Stephen Shaw <stshaw>
Component: GNOMEAssignee: JP Rosevear <jpr>
Status: RESOLVED INVALID QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P1 - Urgent CC: aj, coolo, jdouglas, mkraft
Version: Beta 2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Sample program to reproduce the hang
updated test case
Backtrace of python interpreter

Description Stephen Shaw 2007-08-31 22:05:28 UTC
If you put the wrong url in for a http install source, it hangs vm-install.

debug output:

INFO     Starting VM creation job (key=0)...
INFO     Creating disk '/xenMigration/var/lib/xen/images/rhel5/disk0'
DEBUG    Creating raw file '/xenMigration/var/lib/xen/images/rhel5/disk0', 8589934592 bytes (sparse=1).
INFO     Exception caught in job.run: Invalid URL: [Errno 14] HTTP Error 404: Not Found
INFO     Installation failed, so destroying VM
INFO     Installation failed, so cleaning up disks
INFO     Deleting created disk '/xenMigration/var/lib/xen/images/rhel5/disk0'
ERROR    VM creation job failed: Invalid URL: [Errno 14] HTTP Error 404: Not Found
Comment 1 Charles Coffing 2007-08-31 23:21:12 UTC
This seems to only happen on openSUSE 10.3, and only on large multiprocessor machines (8 proc in this case).

gtk.Window.destroy() is hanging.
Comment 2 Charles Coffing 2007-09-04 22:50:13 UTC
Created attachment 161822 [details]
Sample program to reproduce the hang
Comment 3 Charles Coffing 2007-09-04 22:57:58 UTC
The attached script demonstrates the problem.  window.destroy() seems to deadlock when it is called from another thread.

When running the demo script, press button 1 to do window.destroy() from the main thread.  This works.  The window closes, and you can Ctrl-C to exit the script.  Press button 2 to do window.destroy() from second thread.  The window may or may not get removed from the screen, but control never returns from the destroy() call, and you must kill -9 the script.

Seems to be timing related:
- Usually doesn't occur on 1 or 2 proc machines
- Occurs on 4 or 8 proc machines
- But doesn't occur if I first "ssh -X root@localhost" even on 8 proc

I am not sure if this is a gtk2 bug or a python-gtk bug.

Changing component to "GNOME - Platform".  Hope that's right.
Comment 4 JP Rosevear 2007-09-04 23:36:36 UTC
I'm not gtk/threading expoert but a couple of quick notes based on:
http://www.gtk.org/faq/#AEN482

There is no enter/leave group around the main call, just an init

The thread signal handler is not flushing the commands to the x server before leaving.

Larry and Federico can provide better analysis that I, but gtk 2.11.x closed up some races in the threading code which may explain why this is only appearing in 10.3

Comment 5 Larry Ewing 2007-09-05 19:15:33 UTC
A couple more notes.

I'm having trouble reproducing the deadlock so I can't tell if my suggestions will fix it problem but I see some issues your init function is being run before your main function which to me would indicate that a lot of gdk calls are made before initializing threads which is a potential source of problems and like jp mentioned you should wrap main with an enter/leave if you intend on making gui calls from handlers.  I'll attach an updated test case you can try.
Comment 6 Larry Ewing 2007-09-05 19:17:04 UTC
Created attachment 162104 [details]
updated test case
Comment 7 JP Rosevear 2007-09-05 19:42:48 UTC
Re-assigning back to the XEN team to review and test the new test case.  Please let us know if you are still having problems after this.
Comment 8 Charles Coffing 2007-09-05 20:17:56 UTC
*** Bug 307467 has been marked as a duplicate of this bug. ***
Comment 9 Charles Coffing 2007-09-05 20:27:17 UTC
Tested attachment from comment #6.  No change in behavior.
Comment 10 Charles Coffing 2007-09-05 20:30:04 UTC
Created attachment 162118 [details]
Backtrace of python interpreter

Here is a backtrace of the python process after it locks up.  Python has a global interpreter lock, and GTK has a global lock.  My suspicion is a classical AB/BA deadlock...

One thread seems to be going from GTK trying to get the python lock; another thread has python earlier in the stack and is now polling in GTK.
Comment 11 Charles Coffing 2007-09-05 22:45:29 UTC
Sending back to JP, after trying the updated test case and still having it fail.

Comment 12 Charles Coffing 2007-09-05 22:55:16 UTC
*** Bug 306963 has been marked as a duplicate of this bug. ***
Comment 13 Stephan Kulow 2007-09-07 08:20:29 UTC
I don't see how this is a blocker
Comment 26 JP Rosevear 2007-09-10 15:20:41 UTC
Charles, can you get a trace with debuginfo packages installed? glib2, gtk2 and python ones at least.
Comment 27 Charles Coffing 2007-09-10 15:25:43 UTC
Yes, will get that today.
Comment 28 Charles Coffing 2007-09-10 23:18:40 UTC
The lab machines have been updated to openSUSE 10.3 beta 3, and now I cannot reproduce.  (The other machines have been busy, so I have only been able to try on the original 8-processor machine.)

I looked at the changelogs for gtk2, glib2, python, and python-gtk and nothing suggested a relevant fix.  Perhaps timing has simply changed in this build.  I'll try to get time on some other machines to test tomorrow.

Dropping severity.
Comment 29 JP Rosevear 2007-09-18 19:06:54 UTC
If its not repeatable and there are no further updates here, I'm going to close.  Please reopen if you can reproduce the problem in future.
Comment 30 Gary Ekker 2008-03-26 18:08:05 UTC
Changing to component GNOME. Sorry for the spam.