Bug 324531 (MONO81856) - Dead Lock with thread abort/locks in signal handlers
Summary: Dead Lock with thread abort/locks in signal handlers
Status: RESOLVED FIXED
Alias: MONO81856
Product: Mono: Runtime
Classification: Mono
Component: misc (show other bugs)
Version: 1.2
Hardware: Other Linux
: P3 - Medium : Minor
Target Milestone: ---
Assignee: Mono Bugs
QA Contact: Mono Bugs
URL:
Whiteboard:
Keywords: blocking
Depends on:
Blocks:
 
Reported: 2007-06-11 18:25 UTC by James Stevenson
Modified: 2009-01-27 18:44 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Wiest 2007-09-15 20:41:24 UTC


---- Reported by james@stev.org 2007-06-11 11:25:39 MST ----

Please fill in this template when reporting a bug, unless you know what you
are doing.
Description of Problem:


Steps to reproduce the problem:
1. Run The Code Below

Actual Results:
This locks up somewhere in the mysql libs and never returns from strace i
beleave it was happening on either the creation or the destruction of a
thread. It does not occur in MS .NET on windows with the same mysql server
and driver.

Expected Results:
To run Indefinatly


How often does this happen? 
Randomly but within 5 - 10 minutes for me

Additional Information:

Code

using System;
using System.Collections.Generic;
using System.Text;
using MySql.Data.MySqlClient;

namespace Test {
	class Program {
		static void Main(string[] args) {
			try {
				MySqlConnection mysql;
				mysql = new MySqlConnection("host=etc,,,");
				int i = 0;

				mysql.Open();

				while (true) {
					MySqlCommand sql = new MySqlCommand("select id from alias where userid
= ?userid and gametypeid = ?gametypeid and centerid = ?centerid and name =
?name");
					MySqlDataReader rec;

					sql.Parameters.AddWithValue("?userid", 1409);
					sql.Parameters.AddWithValue("?gametypeid", 1);
					sql.Parameters.AddWithValue("?centerid", 1);
					sql.Parameters.AddWithValue("?name", "jcjfdjal");

					sql.Connection = mysql;
					//sql.Prepare();
					rec = sql.ExecuteReader();
					rec.Close();

					Console.WriteLine(" Count = " + i);
					i++;
				}
			} catch (Exception e) {
				Console.WriteLine(e.Message);
				Console.WriteLine(e.StackTrace);
			}
		}
	}
}



---- Additional Comments From lupus@ximian.com 2007-07-04 12:14:08 MST ----

What kind of box are you using? x86 smp?
If you could also include a sample db that can be used to reproduce
with this program it would be great.



---- Additional Comments From james@stev.org 2007-07-05 05:15:00 MST ----


The table i am querying looks like this

CREATE TABLE `alias` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `userid` int(10) unsigned NOT NULL,
  `gametypeid` int(11) NOT NULL,
  `centerid` int(11) NOT NULL,
  `name` varchar(128) NOT NULL,
  `count` int(10) unsigned NOT NULL,
  `lastseen` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `userid` (`userid`,`gametypeid`,`name`,`centerid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

The machine is

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2600+
stepping        : 1
cpu MHz         : 2088.059
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts
bogomips        : 4179.16

Single Processor
The same bug does not happen when running on MS .Net 2.0 in windows
using the database remotely to the linux machine nor on its local
machine running a database. When running on either a local or a remote
database the problem will occur when running in mono. I cannot get the
bug to occur with tracing enabled on mono which would suggest to me
its going to be hard to reproduce on a different system and is
possibly some kind of race.





---- Additional Comments From james@stev.org 2007-07-16 07:44:56 MST ----


More information

Somebody on irc asked me to append the following
After sending a QUIT signal to mono the following is dumped.

Full thread dump:
"" tid=0x0xb6e99bb0 this=0x0x60d10:
"" tid=0x0xb7cd8aa0 this=0x0x21e00:

I am able to trigger this bug on various version of mono / mysql libs.
Mysql Lib tested with are 5.0.3.0 through 5.0.7.0 All of which seem to
cause the same deadlock.

I have also tested this against mono 1.2.4 and various svn HEAD trees
in the last 2/3 weeks. Last tested against revision 82024

I am pritty sure this is not a mysql lib issue as the mono runtime
always locks up. I have tried to use the info on
http://www.mono-project.com/Debugging to attempt to get a stack trace
to look into this issue further but most of the internal functions
stop on mono_domain_lock

I cannot trigger the deadlock with tracing enabled.
The deadlock does not trigger on windows. Under MS .NET

Where next ?




---- Additional Comments From james@stev.org 2007-07-16 09:11:07 MST ----


Grendel on irc suggested that i turn on the thread debugging and try
to retrigger the bug. This is the last of the output it produced
** Message: _wapi_thread_handle_from_id: Returning 0x127631 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127631 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127631 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127631 for self
thread -1365808208 from TLS
** Message: _wapi_thread_set_termination_details: Thread 0x127631
terminating
** Message: thread_start_routine: started thread id -1236284496
** Message: CreateThread: Started thread handle 0x127635 ID -1236284496
** Message: _wapi_thread_handle_from_id: Returning 0x403 for self
thread -1211114848 from TLS
** Message: _wapi_thread_abandon_mutexes: Thread 0x127631 abandoning
held mutexes
** Message: _wapi_thread_set_termination_details: Recording thread
handle 0x127631 id -1365808208 status as -1
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
 Count = 603918
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127635 for self
thread -1236284496 from TLS
** Message: _wapi_thread_set_termination_details: Thread 0x127635
terminating
** Message: thread_start_routine: started thread id -1365808208
** Message: CreateThread: Started thread handle 0x127639 ID -1365808208
** Message: _wapi_thread_handle_from_id: Returning 0x403 for self
thread -1211114848 from TLS
** Message: _wapi_thread_abandon_mutexes: Thread 0x127635 abandoning
held mutexes
** Message: _wapi_thread_set_termination_details: Recording thread
handle 0x127635 id -1236284496 status as -1
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
 Count = 603919
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_handle_from_id: Returning 0x127639 for self
thread -1365808208 from TLS
** Message: _wapi_thread_set_termination_details: Thread 0x127639
terminating
** Message: thread_start_routine: started thread id -1236284496
** Message: CreateThread: Started thread handle 0x12763d ID -1236284496
** Message: _wapi_thread_handle_from_id: Returning 0x403 for self
thread -1211114848 from TLS
** Message: _wapi_thread_abandon_mutexes: Thread 0x127639 abandoning
held mutexes
** Message: _wapi_thread_set_termination_details: Recording thread
handle 0x127639 id -1365808208 status as -1




---- Additional Comments From lupus@ximian.com 2007-07-16 10:48:13 MST ----

gdb is not very useful for me, because it doesn't produce backtraces.
I decode a couple of stack dumps and one seemd stuck in the
mono_domain_lock() in object_register_finalizer() (seemigly as the
result of SuppressFinalizer in the Timer.cs Dispose method).
Another thread was inside a sigusr1 signal handler and stuck
in the mono_domain_lock call in mono_jit_info_table_find().
Using a lock-free mono_jit_info_table_find() allows the program to
continue for hundreds of thousands of iterations at least
(I just stopped it after a few minutes).



---- Additional Comments From james@stev.org 2007-07-16 11:08:12 MST ----

This too will also cause a deadlock in the same way for me but without
the mysql libs being involved.

	class Temp {
		public void Run() {
			while (true)
				Thread.Sleep(1000);
		}
	}

	class Program {
		static void Main(string[] args) {
			try {
				Thread Thd;
				Temp Tmp = new Temp();
				int i = 0;

				while (true) {
					Thd = new Thread(new ThreadStart(Tmp.Run));
					Thd.Start();
					Thd.Abort();

					Console.WriteLine(" Count = " + i);
					i++;
				}

			} catch (Exception e) {
				Console.WriteLine(e.Message);
			}
			Console.WriteLine("Exiting");
		}



---- Additional Comments From lupus@ximian.com 2007-07-16 12:03:27 MST ----

The new test case just confirms the same pattern: it produces 4
threads with the following manually decoded traces at the time of the
deadlock:

thread1: (main thread)
mono_domain_lock ()
mono_jit_find_compiled_method ()
mono_create_jump_trampoline ()
mono_ldftn()

thread2: (started thread)
mono_domain_lock()
mono_jit_info_table_find ()
sigusr1_signal_handler ()

thread3: (finalizer thread)
timedwait_signal_poll_cond ()
_wapi_handle_timedwait_signal_handle ()
_wapi_handle_wait_signal_handle ()
WaitForSingleObjectEx ()

thread4: (io-layer thread)
start_thread()

There should be no deadlock with just the above calls, since one
thread should block while the other takes the domain lock and proceeeds.
The fact that both blocked may mean that a thread has been aborted
with the domain lock held: which looks like this:
lock = {depth = 0, mutex = {__data = {__lock = 2, __count = 0, __owner
= 0, __kind = 1, 
        __nusers = 0, {__spins = 0, __list = {__next = 0x0}}}, 
      __size = "\002", '\0' <repeats 11 times>,
"\001\000\000\000\000\000\000\000\000\000\000", 
      __align = 2}}





---- Additional Comments From lupus@ximian.com 2007-08-30 13:27:13 MST ----

The issue with this test was that inside the signal handler used to
notify the abort request we took a lock. The same lock was being taken
by some other thread. Indeed, pthread_mutex_lock is not one of the
functions that are allowed in a signal handler.
The fix required quite a bit of work, though luckily Mark did it all:)
We now can lookup the runtime info we need to know about in the signal
handler without taking locks.
There are still a few more cases where we currently take a lock in
signal handlers (for example by way of QueueUserAPC), but the chances
of hitting a deadlock in those case is pretty low, still we'll need to
fix them eventually, so I'm just downgrading the priority of this bug.



Unknown bug field "cf_op_sys_details" encountered while moving bug
   <cf_op_sys_details>Etch with mono 1.2.4</cf_op_sys_details>
Unknown bug field "cf_version_details" encountered while moving bug
   <cf_version_details>svn HEAD</cf_version_details>

Comment 1 Brett Kiefer 2008-02-15 20:37:42 UTC
Since FogBugz uses Mono for its search service, and since it is inconvenient for us to ask our customers to upgrade to Mono 1.2.6, we created a version of the MySql Net/Connector that appears to successfully work around this bug by using socket timeouts rather than timers. The .dll, a tar of the complete code, and a diff file of the changes are available here, if it's useful to anyone else: http://www.fogcreek.com/FogBugz/kb/setup/ModifyingMySQL.Data.html
Comment 2 Forgotten User vxPDddArjq 2009-01-26 08:35:27 UTC
These issues were fixed a long time ago in 2.0.
Comment 3 Forgotten User vxPDddArjq 2009-01-26 08:47:34 UTC
It looks like the last test case still crashes on HEAD.
Comment 4 Forgotten User vxPDddArjq 2009-01-27 18:44:06 UTC
The testcase crash is now fixed.