Bugzilla – Bug 318079
deadlock in runtime
Last modified: 2007-09-15 21:24:46 UTC
---- Reported by james@ximian.com 2005-05-26 00:23:55 MST ---- Our app locked up today in a few cases (like 2% of the tests). It appears to be a deadlock in the runtime. This was on a P4 with hyperthreading and NLD. Using mono 1.1.7.2. ---- Additional Comments From james@ximian.com 2005-05-26 00:24:21 MST ---- Created an attachment (id=168012) gdb trace ---- Additional Comments From naresh@novell.com 2005-05-27 15:25:40 MST ---- Tagging as Birdman (ZLM 7) bug in Mono 1.1.7.2. ---- Additional Comments From miguel@ximian.com 2005-06-08 08:05:14 MST ---- For this bug we do not have a good stack trace. Would it be possible to get information on how to log into these machines, compile Mono in there (we want to be able to add/remove debugging statements) and how to reproduce the case on our own? In addition, we have been providing you with debugging builds which will produce a better trace. ---- Additional Comments From naresh@novell.com 2005-06-13 19:11:38 MST ---- Ping Dan Mills (thunder@ximian.com) he's working on the above. ---- Additional Comments From miguel@ximian.com 2005-06-15 15:28:01 MST ---- I've been trying to reproduce mono https://bugzilla.novell.com/show_bug.cgi?id=MONO75050 on a clean box that we can lend to the mono team for debugging. I think I have it. Here is the machine info: IP: 151.155.170.91 Root pw: fart There is a screen session running (with escape char ^\) with three windows, rug, zmd, and a shell prompt. The sources for mono and zmd are in /root/build. The mono is 1.1.7.3 from svn. The rug command I used to reproduce it is: while true; do rug sa tekkadon.boston.ximian.com; rug sa --type=rce https://nrm-stage.boston.ximian.com/data; rug sd 1; rug sd 1; done Sometimes it takes a few minutes to happen. If if hasn't happened in a few minutes, it's best to restart zmd, as sometimes it can go for hours and not occur. If there's anything else I can do, let me know. ---- Additional Comments From miguel@ximian.com 2005-06-15 15:30:28 MST ---- Is it normal that the command says: Successfully added service 'tekkadon.boston.ximian.com' ERROR: No valid URI was specified. -bash: https://nrm-stage.boston.ximian.com/data: No such file or directory Successfully removed service 'https://tekkadon.boston.ximian.com' ERROR: Can not find service '1' ---- Additional Comments From miguel@ximian.com 2005-06-15 15:52:49 MST ---- Btw, as it turns out the bug is in `zmd', not `rug'. ---- Additional Comments From miguel@ximian.com 2005-06-15 18:30:03 MST ---- To debug, this is the command that is /usr/local/bin/mono /usr/local/lib/zmd/zmd.exe ---- Additional Comments From miguel@ximian.com 2005-06-15 18:59:44 MST ---- I have asked Naresh for some help on this issue, I tried debugging this, but gdb is not even aware that there are threads running on the mono application (it reports that there are no threads to `info threads') and any attempt to switch threads fails with `Thread ID XX not known.'. Strace is also non-functioning, so this makes things harder to track down. So we need your help to get: * The upgraded kernel that apparently fixes the gdb/strace issues. * The upgraded gdb/strace on the machine. Miguel. ---- Additional Comments From naresh@novell.com 2005-06-16 08:49:05 MST ---- I'm looking into this and asking SUSE for any information. Miguel I assume this is an issue on every SUSE platform, i.e. gdb working and not specific to SLES 9. ---- Additional Comments From naresh@novell.com 2005-06-16 10:35:49 MST ---- These have been tracked, available internally in SLES 9 SP2 RC2. Separate e-mail has been sent on location (don't want it in Bugzilla) to Miguel and others, Dan Mills is upgrading current system. Please check with him. ---- Additional Comments From thunder@ximian.com 2005-06-16 11:17:22 MST ---- Ok, I upgraded the kernel, strace, and gdb. I verified that we can reproduce the hang, and I attached with gdb and it seemed to produce more info than before. Same machine, same password. I started a screen session again, escape key ^\, with three windows (rug, zmd, gdb). Let me know if you need anything else. Also, setting this bug as 'ximian employees only', since it contains an ip & password in the comments. ---- Additional Comments From bmaurer@users.sf.net 2005-06-17 00:41:28 MST ---- The box is in the firewall, so its fine. ---- Additional Comments From bmaurer@users.sf.net 2005-06-17 00:42:46 MST ---- Thread 1: #14 0x080c8eb2 in mono_loader_lock () #15 0x0809a933 in mono_class_from_name () #16 0x080f7f87 in mono_method_get_object () #17 0x080e4d53 in mono_delegate_ctor () Thread 3: #15 0x080a8dbd in mono_assembly_invoke_search_hook () #16 0x080a824b in mono_assembly_names_equal () #17 0x080a9922 in mono_assembly_load_from_full () #18 0x080a95c1 in mono_assembly_open_full () #19 0x080aa401 in mono_assembly_load_with_partial_name () #20 0x080aa6c9 in mono_assembly_load_full () #21 0x080aa7d5 in mono_assembly_load () #22 0x080a8bc0 in mono_assembly_load_reference () #23 0x08093370 in mono_class_from_typeref () #24 0x080c717a in mono_method_get_signature () #25 0x080c7ece in mono_lookup_pinvoke_call () #26 0x080c83ce in mono_get_method_full () We are seeing the loader/domain lock order issue. *** This bug has been marked as a duplicate of https://bugzilla.novell.com/show_bug.cgi?id=MONO75007 *** Imported an attachment (id=168012) Unknown operating system unknown. Setting to default OS "Other". This bug was marked DUPLICATE in the database it was moved from. Changing resolution to "MOVED"