Bug 317622 (MONO74455) - deadlock in mono_method_desc_new?
Summary: deadlock in mono_method_desc_new?
Status: RESOLVED MOVED
Alias: MONO74455
Product: Mono: Runtime
Classification: Mono
Component: misc (show other bugs)
Version: 1.1
Hardware: Other Other
: P3 - Medium : Major
Target Milestone: ---
Assignee: Mono Bugs
QA Contact: Mono Bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-04-06 20:19 UTC by James Willcox
Modified: 2007-09-15 21:24 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
list of stack traces (6.15 KB, text/plain)
2005-04-06 20:19 UTC, Thomas Wiest
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Wiest 2007-09-15 19:12:36 UTC


---- Reported by james@ximian.com 2005-04-06 13:19:05 MST ----

We are experiencing frequent deadlocks with our application.  All threads
appear to be halted, even ones that are completely autonomous where no
deadlock is possible.  After attaching with gdb, it shows that almost all
threads are waiting in mono_method_desc_new.  I will attach the traces.



---- Additional Comments From james@ximian.com 2005-04-06 13:19:31 MST ----

Created an attachment (id=167702)
list of stack traces




---- Additional Comments From miguel@ximian.com 2005-04-06 15:58:51 MST ----

Please provide us with stack traces that contain

mono_print_method_from_ip for relevant ips.



---- Additional Comments From vargaz@gmail.com 2005-04-06 21:47:47 MST ----

These traces look corrupt. method_desc_new is a very simple function
which involves no locking/waiting on mutexes. Would it be possible to
run the application with a runtime compiled with debugging info
(i.e. -g instead of -O2) ?



---- Additional Comments From miguel@ximian.com 2005-04-09 17:43:04 MST ----

James, ping?

I asked Naresh yesterday for you folks to rebuild Mono with -g and try
out the new packages. 

Can you provide us either a test case, or a better stack trace?

Setting the bug to [NEEDINFO] until we get more data.



---- Additional Comments From naresh@novell.com 2005-04-13 23:48:18 MST ----

David Lewis:  I was able to get the following stack trace from the
hung zmd process in the superlab.  James, if this isn't enough
information, you can ssh to the machine like this:



#0  0xffffe410 in ?? ()
#1  0xbfffe1b8 in ?? ()
#2  0x0002c867 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d1b0c in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#5  0x080d90f1 in _wapi_handle_wait_signal_handle ()
#6  0x080dac53 in WaitForSingleObjectEx ()
#7  0x0808c35c in ves_icall_System_Threading_WaitHandle_WaitOne_internal
()
#8  0x41b05d7f in ?? ()
#9  0x0828ffc0 in ?? ()
#10 0x00000409 in ?? ()
#11 0xffffffff in ?? ()
#12 0x00000000 in ?? ()
#13 0x08199ec0 in ?? ()
#14 0x08199fc0 in ?? ()
#15 0x089a2bd8 in ?? ()
#16 0x00000409 in ?? ()
#17 0xffffffff in ?? ()
#18 0x0828ffc0 in ?? ()
#19 0xbfffe2a0 in ?? ()
#20 0x41b05d54 in ?? ()
#21 0xbfffe2c8 in ?? ()
#22 0x41b05c8d in ?? ()
#23 0x0828ffc0 in ?? ()
#24 0x00000409 in ?? ()
#25 0xffffffff in ?? ()
#26 0x00000000 in ?? ()
#27 0x00000000 in ?? ()
#28 0x41b05838 in ?? ()
#29 0x08244708 in ?? ()
#30 0x0828ffc0 in ?? ()
#31 0xbfffe2e0 in ?? ()
#32 0x41b05c14 in ?? ()
#33 0x0828ffc0 in ?? ()
#34 0x00000409 in ?? ()
#35 0xffffffff in ?? ()
#36 0x00000000 in ?? ()
#37 0xbfffe338 in ?? ()
#38 0x4049115c in ?? ()

--David




---- Additional Comments From miguel@ximian.com 2005-04-14 01:02:59 MST ----

Hello,

    Please run this command:

(gdb) thread apply all bt

    Then there are various addresses in the stack, as indicated
before, please provide:

(gdb) p mono_print_method_from_ip (address)

    For all the addresses without names.

Miguel.



---- Additional Comments From james@ximian.com 2005-04-14 12:05:48 MST ----

They got another hang in the superlab, backtrace here. 
mono_print_method_from_ip was not returning the correct stuff (it
returned a decimal number, not the method signature)

Thread 11 (Thread 1109760944 (LWP 29289)):
#0  0xffffe410 in ?? ()
#1  0x42259b0c in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400cfde3 in _L_mutex_lock_2518 () from /lib/tls/libpthread.so.0
#6  0x42259ac4 in ?? ()
#7  0x00000000 in ?? ()
#8  0x00000000 in ?? ()
#9  0x00000000 in ?? ()
#10 0x00000000 in ?? ()
#11 0x00000000 in ?? ()
#12 0x00000000 in ?? ()
#13 0x42259bb0 in ?? ()
#14 0x400d7bc4 in __JCR_LIST__ () from /lib/tls/libpthread.so.0
#15 0x00000000 in ?? ()
#16 0x00000000 in ?? ()
#17 0x42259b0c in ?? ()
#18 0x42259aa4 in ?? ()
#19 0x400cf9f8 in start_thread () from /lib/tls/libpthread.so.0
#20 0x401ba9da in clone () from /lib/tls/libc.so.6

Thread 10 (Thread 1079638960 (LWP 14321)):
#0  0xffffe410 in ?? ()
#1  0x4059f938 in ?? ()
#2  0x0013aab9 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d1b0c in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#5  0x080d90f1 in _wapi_handle_wait_signal_handle ()
#6  0x080dac53 in WaitForSingleObjectEx ()
#7  0x080b3b00 in finalizer_thread ()
#8  0x0808b782 in start_wrapper ()
#9  0x080dd694 in timed_thread_start_routine ()
#10 0x080e6d74 in GC_start_routine ()
#11 0x400cfa13 in start_thread () from /lib/tls/libpthread.so.0
#12 0x401ba9da in clone () from /lib/tls/libc.so.6
 

Thread 9 (Thread 1087134640 (LWP 14324)):
#0  0xffffe410 in ?? ()
#1  0x40cc54a0 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400d0c60 in _L_mutex_lock_34 () from /lib/tls/libpthread.so.0
#6  0x00000451 in ?? ()
#7  0x000037ef in ?? ()
#8  0x08187ce0 in __JCR_LIST__ ()
#9  0x08186c9c in thread_hash_once ()
#10 0x00000451 in ?? ()
#11 0x40cc5510 in ?? ()
#12 0x080cdc37 in thread_exit ()
#13 0x080cdc37 in thread_exit ()
#14 0x080dd50f in _wapi_timed_thread_exit ()
#15 0x080ce397 in ExitThread ()
#16 0x0808bab2 in mono_thread_exit ()
#17 0x081180bb in mono_handle_exception ()
#18 0x08118d9b in throw_exception ()
#19 0x40018435 in ?? ()


Thread 8 (Thread 1090534320 (LWP 14325)):
#0  0xffffe410 in ?? ()
#1  0x410038e8 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400d0c60 in _L_mutex_lock_34 () from /lib/tls/libpthread.so.0
#6  0x08623438 in ?? ()
#7  0x410038e0 in ?? ()
#8  0x08187ce0 in __JCR_LIST__ ()
#9  0x00001388 in ?? ()
#10 0x08186c9c in thread_hash_once ()
#11 0x41003938 in ?? ()
#12 0x080ce816 in GetCurrentThread ()
#13 0x080ce816 in GetCurrentThread ()
#14 0x080cec44 in SleepEx ()
#15 0x0808bc91 in ves_icall_System_Threading_Thread_Sleep_internal ()
#16 0x4098a0eb in ?? ()
#17 0x00001388 in ?? ()
#18 0x08621448 in ?? ()
#19 0x086214f8 in ?? ()

Thread 7 (Thread 1100200880 (LWP 14326)):
#0  0xffffe410 in ?? ()
#1  0x4193b864 in ?? ()
#2  0x0013a9db in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d1b0c in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#5  0x080d90f1 in _wapi_handle_wait_signal_handle ()
#6  0x080dac53 in WaitForSingleObjectEx ()
#7  0x0811b238 in ves_icall_System_Threading_Monitor_Monitor_wait ()
#8  0x40993a2f in ?? ()
#9  0x0855c848 in ?? ()

Thread 6 (Thread 1103137712 (LWP 14328)):
#0  0xffffe410 in ?? ()
#1  0x41c088d8 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400d0c60 in _L_mutex_lock_34 () from /lib/tls/libpthread.so.0
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0x08187ce0 in __JCR_LIST__ ()
#9  0x00000064 in ?? ()
#10 0x08186c9c in thread_hash_once ()
#11 0x41c08928 in ?? ()
#12 0x080ce816 in GetCurrentThread ()
#13 0x080ce816 in GetCurrentThread ()
#14 0x080cec44 in SleepEx ()
#15 0x0808bc91 in ves_icall_System_Threading_Thread_Sleep_internal ()
#16 0x4098a0eb in ?? ()
#17 0x00000064 in ?? ()

Thread 5 (Thread 1104190384 (LWP 14329)):
#0  0xffffe410 in ?? ()
#1  0x41d098d8 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400d0c60 in _L_mutex_lock_34 () from /lib/tls/libpthread.so.0
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0x08187ce0 in __JCR_LIST__ ()
#9  0x00000064 in ?? ()
#10 0x08186c9c in thread_hash_once ()
#11 0x41d09928 in ?? ()
#12 0x080ce816 in GetCurrentThread ()
#13 0x080ce816 in GetCurrentThread ()
#14 0x080cec44 in SleepEx ()
#15 0x0808bc91 in ves_icall_System_Threading_Thread_Sleep_internal ()
#16 0x4098a0eb in ?? ()
#17 0x00000064 in ?? ()

Thread 4 (Thread 1105243056 (LWP 14330)):
#0  0xffffe410 in ?? ()
#1  0x41e0a604 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400d0c60 in _L_mutex_lock_34 () from /lib/tls/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#6  0x00000739 in ?? ()
#7  0x080b2f24 in rescale128 ()
#8  0x080ce03f in CreateThread ()
#9  0x0808b8b5 in mono_thread_create ()
#10 0x08091cc8 in mono_thread_pool_add ()
#11 0x08095485 in mono_delegate_begin_invoke ()
#12 0x4214abb9 in ?? ()
#13 0x08f0d6e0 in ?? ()


Thread 3 (Thread 1107479472 (LWP 14843)):
#0  0xffffe410 in ?? ()
#1  0x4202c88c in ?? ()
#2  0x0013a42f in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d1b0c in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#5  0x080d90f1 in _wapi_handle_wait_signal_handle ()
#6  0x080dac53 in WaitForSingleObjectEx ()
#7  0x0808c35c in
ves_icall_System_Threading_WaitHandle_WaitOne_internal ()
#8  0x41b05d7f in ?? ()
#9  0x08437138 in ?? ()


Thread 2 (Thread 1112124336 (LWP 14845)):
#0  0xffffe410 in ?? ()
#1  0x4249a71c in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d3efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x400d0c60 in _L_mutex_lock_34 () from /lib/tls/libpthread.so.0
#6  0x000004f5 in ?? ()
#7  0x00000000 in ?? ()
#8  0x08187ce0 in __JCR_LIST__ ()
#9  0x40a10250 in ?? ()

#10 0x08186c9c in thread_hash_once ()
#11 0x4249a76c in ?? ()
#12 0x080ce816 in GetCurrentThread ()
#13 0x080ce816 in GetCurrentThread ()
#14 0x080da93d in WaitForSingleObjectEx ()
#15 0x0811b2e1 in ves_icall_System_Threading_Monitor_Monitor_wait ()
#16 0x40993a2f in ?? ()
#17 0x08be5558 in ?? ()


Thread 1 (Thread 1075939136 (LWP 14319)):
#0  0xffffe410 in ?? ()
#1  0xbfffe1b8 in ?? ()
#2  0x0013a9bf in ?? ()
#3  0x00000000 in ?? ()
#4  0x400d1b0c in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#5  0x080d90f1 in _wapi_handle_wait_signal_handle ()
#6  0x080dac53 in WaitForSingleObjectEx ()
#7  0x0808c35c in
ves_icall_System_Threading_WaitHandle_WaitOne_internal ()
#8  0x41b05d7f in ?? ()
#9  0x0828ffc0 in ?? ()





---- Additional Comments From miguel@ximian.com 2005-04-14 17:14:36 MST ----

I logged into the machine today to track down what was going on.

There is a suspicious candidate: the DefaultException handler seems to
be recursing, it is not clear that the code has a mechanism for
resuming execution on the main thread and it would thus stop
processing responses.





---- Additional Comments From miguel@ximian.com 2005-04-16 12:30:23 MST ----

At this point, I believe the deadlock was on ZLM code.

REopen if you have further data



---- Additional Comments From kodriscoll@novell.com 2005-04-27 17:12:01 MST ----

we are also seeing this same issue in the Superlab in Provo testing
Birdman.  We are seeing it on mulitple boxes, and are doing nothing
more than a rug sl commmand.  You can ssh into 151.155.187.101 and
then ssh to 10.1.1.0 to see it.

Kent O'Driscoll
ext 12802



---- Additional Comments From kodriscoll@novell.com 2005-04-27 17:17:53 MST ----

Sorry...I gave the wrong ip addrs to see this.  ssh to 151.155.187.103
and then 10.3.29.3.  If issue a #ps -ef|grep rug command you will see
a rug sd 1 process that has been sitting there for several hours.



---- Additional Comments From james@ximian.com 2005-04-28 12:54:16 MST ----

They reproduced this again in the superlab today.  I attached with
gdb, and found all threads were either blocking on a mutex or
sleeping.  Tried to use mono_print_method_from_ip to see what things
were going on, and that deadlocked.  Trace from that is below.

#0  0xffffe410 in ?? ()
#1  0xbfffe660 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000000 in ?? ()
#4  0x40033efe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#5  0x40030c6c in _L_mutex_lock_88 () from /lib/tls/libpthread.so.0
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0x08865288 in ?? ()
#9  0x0887b140 in ?? ()
#10 0x00000000 in ?? ()
#11 0xbfffe680 in ?? ()
#12 0x080f133f in EnterCriticalSection ()
#13 0x080f133f in EnterCriticalSection ()
#14 0x080db328 in mono_jit_info_table_find ()
#15 0x0813af6c in mono_print_method_from_ip ()
#16 <function called from gdb>




---- Additional Comments From naresh@novell.com 2005-04-28 14:42:09 MST ----

*** https://bugzilla.novell.com/show_bug.cgi?id=MONO74460 has been marked as a duplicate of this bug. ***



---- Additional Comments From miguel@ximian.com 2005-05-03 21:27:49 MST ----

One addition:

James, when you run `p mono_print_method_from_ip' the output will go
to the mono process stdout, which means that the method addresses will
be printed in whatever output file zlm is logging to.

Anyways, if you have another hung machine, please post the ip address,
the current ones listed on this bug report do not exhibit this problem.



---- Additional Comments From miguel@ximian.com 2005-05-18 13:08:35 MST ----

I believe this was another manifestation of the SMP issues we fixed
last week.

Feel free to reopen if it happens again.



---- Additional Comments From naresh@novell.com 2005-05-27 12:11:42 MST ----

Tagging as a Birdman (ZLM 7) Mono bug.
Re-opening as not fixed in Mono 1.1.7.2 based on the following
information from Kent.

>>>Kent O'Driscoll 05/26/05 4:53 pm >>>
The other issue we are still seeing is Remedy DEFECT000409838 which is
related mono (bugzilla defects 74455 & 75050.) 
Kent, Jim, John. 



---- Additional Comments From bmaurer@users.sf.net 2005-06-17 00:33:19 MST ----

This is pretty clearly a dup of the beagle lock issue. in thread 2
(from the first trace given), we are in mono_get_method_full then
mono_assembly_invoke_search_hook: loader then domain. In thread 3 we
are in mono_delegate_ctor then mono_class_from_name: domain then loader.

*** This bug has been marked as a duplicate of https://bugzilla.novell.com/show_bug.cgi?id=MONO75007 ***

Imported an attachment (id=167702)

Unknown operating system unknown. Setting to default OS "Other".
This bug was marked DUPLICATE in the database it was moved from.
    Changing resolution to "MOVED"