Bug 337383

Summary: Runtime assertion when pressing C-c on simple app.
Product: [Mono] Mono: Runtime Reporter: Miguel de Icaza <miguel>
Component: JITAssignee: Mark Probst <mprobst>
Status: RESOLVED FIXED QA Contact: Mono Bugs <mono-bugs>
Severity: Normal    
Priority: P5 - None CC: massi
Version: unspecified   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: race.diff
racefix.diff

Description Miguel de Icaza 2007-10-27 17:58:40 UTC
Compile this app with gmcs, and run it, then press control-c (it does not happen always):

$ cat test.cs
using System;
using System.Net;

class abcd {
        public static void Main ()
        {
                HttpListener l = new HttpListener ();
                l.Prefixes.Add ("http://*:4000/foo/");
                l.Start ();
                l.Stop ();
                l.Start ();
                Console.Read ();
        }
}

$ gmcs test.cs
$ mono test.exe
[[ PRESS CONTROL C HERE ]]
** ERROR **: file threads.c: line 3345 (mono_thread_set_state): assertion failed: (ret == 0)
aborting...
Stacktrace:


Native stacktrace:

        mono(mono_handle_native_sigsegv+0xcf) [0x817960f]
        [0xffffe440]
        /lib/libc.so.6(abort+0x103) [0xb7d4dff3]
        /opt/gnome/lib/libglib-2.0.so.0(g_logv+0x46d) [0xb7ed12dd]
        /opt/gnome/lib/libglib-2.0.so.0(g_log+0x35) [0xb7ed1325]
        /opt/gnome/lib/libglib-2.0.so.0(g_assert_warning+0x76) [0xb7ed13a6]
        mono(mono_thread_set_state+0x66) [0x80d4946]
        mono [0x80d5189]
        mono [0x80dbee8]
        mono [0x80d8254]
        mono [0x812502e]
        mono [0x813c055]
        /lib/libpthread.so.0 [0xb7e712ab]
        /lib/libc.so.6(__clone+0x5e) [0xb7de2a4e]
Comment 1 Mark Probst 2007-11-08 13:47:32 UTC
This is a race condition between one thread which deletes a thread's synch_cs critical section and another thread which tries to lock it.

How to reproduce it in gdb:

handle SIG35 noprint
run test.exe
<hit ctrl-c to interrupt it>
signal SIGINT

This is the native stacktrace at the point of the assertion failure:

#0  mono_thread_set_state (thread=0x5fb00, state=ThreadState_Background) at threads.c:3352
#1  0x080ee2fe in ves_icall_System_Threading_Thread_SetState (this=0x5fb00, state=4) at threads.c:1779
#2  0x080f226f in async_invoke_io_thread (data=0x2dea0) at threadpool.c:239
#3  0x080ebf0b in start_wrapper (data=0x83db550) at threads.c:550
#4  0x08152a63 in thread_start_routine (args=0xb7799158) at threads.c:264
#5  0x08172b2a in GC_start_routine ()
#6  0xb7e2831b in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7d8a57e in clone () from /lib/tls/i686/cmov/libc.so.6

The synch_cs is deleted in ves_icall_System_Threading_Thread_Thread_free_internal().  Here's the managed stack trace at the point where the synch_cs is deleted:

  at (wrapper managed-to-native) System.Threading.Thread.Thread_free_internal (intptr) <0x00004>
  at (wrapper managed-to-native) System.Threading.Thread.Thread_free_internal (intptr) <0xffffffff>
  at System.Threading.Thread.Finalize () <0x00041>
  at (wrapper runtime-invoke) System.Collections.Generic.GenericEqualityComparer`1.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>

I haven't figured out yet what causes the race condition but I can offer a quick hack which makes it much less likely for it to cause trouble (see attachment).
Comment 2 Mark Probst 2007-11-08 13:49:08 UTC
Created attachment 182611 [details]
race.diff
Comment 3 Miguel de Icaza 2007-11-08 16:38:24 UTC
I would actually change this a little bit, because it seems like there could be a race between deleting the critical section and setting it to NULL.

I would do this:

critical_section = this->sync_cs;
this->sync_cs = NULL;
DeleteCriticalSection (this->sync_cs);
g_free (critical_section);
Comment 4 Dick Porter 2007-11-08 17:00:54 UTC
It's possible that this is a duplicate of https://bugzilla.novell.com/show_bug.cgi?id=334740 (or vice-versa)
Comment 5 Mark Probst 2007-11-08 18:22:54 UTC
Miguel: I know, but I didn't bother for this quick hack :-)

Dick: I'm pretty sure they're different races.  This one is triggered by the finalization of Thread objects at shutdown.

Anyway, here's a proper fix, hopefully to be in SVN soon.
Comment 6 Mark Probst 2007-11-08 18:24:15 UTC
Created attachment 182687 [details]
racefix.diff
Comment 7 Mark Probst 2007-11-09 10:01:33 UTC
Fixed in SVN.
Comment 8 Miguel de Icaza 2007-11-09 12:25:58 UTC
Should this be backported to the 1-2-6 branch?
Comment 9 Mark Probst 2007-11-09 14:47:19 UTC
I'm pretty sure, yes.
Comment 10 Paolo Molaro 2007-11-12 14:24:11 UTC
I still get some other type of error, though much less often.
Examples include:

** ERROR **: file error.c: line 70 (SetLastError): assertion failed: (ret == 0)
aborting...

** ERROR **: file critical-sections.c: line 97 (DeleteCriticalSection): assertion failed: (ret == 0)
aborting...

** ERROR **: file mono-internal-hash.c: line 46 (mono_internal_hash_table_lookup): assertion failed: (table->table != NULL)
aborting...

This is likely the same pattern: shutdown ordering issues.
Comment 11 Mark Probst 2007-12-17 15:46:44 UTC
Fixed in SVN.