Bug 318079 (MONO75050) - deadlock in runtime
Summary: deadlock in runtime
Status: RESOLVED MOVED
Alias: MONO75050
Product: Mono: Runtime
Classification: Mono
Component: JIT (show other bugs)
Version: 1.1
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Paolo Molaro
QA Contact: Mono Bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-26 07:23 UTC by James Willcox
Modified: 2007-09-15 21:24 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
gdb trace (24.41 KB, text/plain)
2005-05-26 07:24 UTC, Thomas Wiest
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Wiest 2007-09-15 19:18:30 UTC


---- Reported by james@ximian.com 2005-05-26 00:23:55 MST ----

Our app locked up today in a few cases (like 2% of the tests).  It appears
to be a deadlock in the runtime.

This was on a P4 with hyperthreading and NLD.  Using mono 1.1.7.2.



---- Additional Comments From james@ximian.com 2005-05-26 00:24:21 MST ----

Created an attachment (id=168012)
gdb trace




---- Additional Comments From naresh@novell.com 2005-05-27 15:25:40 MST ----

Tagging as Birdman (ZLM 7) bug in Mono 1.1.7.2.



---- Additional Comments From miguel@ximian.com 2005-06-08 08:05:14 MST ----

   For this bug we do not have a good stack trace. 

   Would it be possible to get information on how to log into these
machines, compile Mono in there (we want to be able to add/remove
debugging statements) and how to reproduce the case on our own?


   In addition, we have been providing you with debugging builds which
will produce a better trace.



---- Additional Comments From naresh@novell.com 2005-06-13 19:11:38 MST ----

Ping Dan Mills (thunder@ximian.com) he's working on the above.




---- Additional Comments From miguel@ximian.com 2005-06-15 15:28:01 MST ----

I've been trying to reproduce mono https://bugzilla.novell.com/show_bug.cgi?id=MONO75050 on a clean box that we  
can lend to the mono team for debugging.  I think I have it.  Here is  
the machine info:

IP: 151.155.170.91
Root pw: fart

There is a screen session running (with escape char ^\) with three  
windows, rug, zmd, and a shell prompt.  The sources for mono and zmd  
are in /root/build.  The mono is 1.1.7.3 from svn.

The rug command I used to reproduce it is:

while true; do rug sa tekkadon.boston.ximian.com; rug sa --type=rce  
https://nrm-stage.boston.ximian.com/data; rug sd 1; rug sd 1; done

Sometimes it takes a few minutes to happen.  If if hasn't happened in  
a few minutes, it's best to restart zmd, as sometimes it can go for  
hours and not occur.

If there's anything else I can do, let me know.



---- Additional Comments From miguel@ximian.com 2005-06-15 15:30:28 MST ----

Is it normal that the command says:

Successfully added service 'tekkadon.boston.ximian.com'
ERROR: No valid URI was specified.
-bash: https://nrm-stage.boston.ximian.com/data: No such file or directory
Successfully removed service 'https://tekkadon.boston.ximian.com'
ERROR: Can not find service '1'




---- Additional Comments From miguel@ximian.com 2005-06-15 15:52:49 MST ----

Btw, as it turns out the bug is in `zmd', not `rug'.



---- Additional Comments From miguel@ximian.com 2005-06-15 18:30:03 MST ----

To debug, this is the command that is 
/usr/local/bin/mono /usr/local/lib/zmd/zmd.exe




---- Additional Comments From miguel@ximian.com 2005-06-15 18:59:44 MST ----

I have asked Naresh for some help on this issue, I tried debugging
this, but gdb is not even aware that there are threads running on the
mono application (it reports that there are no threads to `info
threads') and any attempt to switch threads fails with `Thread ID XX
not known.'.

Strace is also non-functioning, so this makes things harder to track
down. 

So we need your help to get:
* The upgraded kernel that apparently fixes the gdb/strace issues.
* The upgraded gdb/strace on the machine.

Miguel.



---- Additional Comments From naresh@novell.com 2005-06-16 08:49:05 MST ----

I'm looking into this and asking SUSE for any information. Miguel I
assume this is an issue on every SUSE platform, i.e. gdb working and
not specific to SLES 9.



---- Additional Comments From naresh@novell.com 2005-06-16 10:35:49 MST ----

These have been tracked, available internally in SLES 9 SP2 RC2. 
Separate e-mail has been sent on location (don't want it in Bugzilla)
to Miguel and others, Dan Mills is upgrading current system.  Please
check with him.



---- Additional Comments From thunder@ximian.com 2005-06-16 11:17:22 MST ----

Ok, I upgraded the kernel, strace, and gdb.

I verified that we can reproduce the hang, and I attached with gdb and it seemed to produce 
more info than before.

Same machine, same password.  I started a screen session again, escape key ^\, with three 
windows (rug, zmd, gdb).  Let me know if you need anything else.

Also, setting this bug as 'ximian employees only', since it contains an ip & password in the 
comments.



---- Additional Comments From bmaurer@users.sf.net 2005-06-17 00:41:28 MST ----

The box is in the firewall, so its fine.



---- Additional Comments From bmaurer@users.sf.net 2005-06-17 00:42:46 MST ----

Thread 1:

#14 0x080c8eb2 in mono_loader_lock ()
#15 0x0809a933 in mono_class_from_name ()
#16 0x080f7f87 in mono_method_get_object ()
#17 0x080e4d53 in mono_delegate_ctor ()

Thread 3:
#15 0x080a8dbd in mono_assembly_invoke_search_hook ()
#16 0x080a824b in mono_assembly_names_equal ()
#17 0x080a9922 in mono_assembly_load_from_full ()
#18 0x080a95c1 in mono_assembly_open_full ()
#19 0x080aa401 in mono_assembly_load_with_partial_name ()
#20 0x080aa6c9 in mono_assembly_load_full ()
#21 0x080aa7d5 in mono_assembly_load ()
#22 0x080a8bc0 in mono_assembly_load_reference ()
#23 0x08093370 in mono_class_from_typeref ()
#24 0x080c717a in mono_method_get_signature ()
#25 0x080c7ece in mono_lookup_pinvoke_call ()
#26 0x080c83ce in mono_get_method_full ()

We are seeing the loader/domain lock order issue.

*** This bug has been marked as a duplicate of https://bugzilla.novell.com/show_bug.cgi?id=MONO75007 ***

Imported an attachment (id=168012)

Unknown operating system unknown. Setting to default OS "Other".
This bug was marked DUPLICATE in the database it was moved from.
    Changing resolution to "MOVED"