Bugzilla – Bug 317999
[PATCH] very strange NullRefernce (SEGV probably)
Last modified: 2007-09-15 21:24:46 UTC
---- Reported by malekith@pld-linux.org 2005-05-15 07:56:42 MST ---- Description of Problem: I have a problem with booting nemerle compiler under mono 1.1.7, amd64. A very tiny change (like adding a debug output, removing some file from compilation command line and so on) fixes the problem. It works perfectly under mono 1.1.7 on x86 and under ms.net (it also verifies). So I think this is some ugly JIT bug. I hate attaching this testcase, because it's utterly big, but I have failed to identify where the problem lies. It seems to segv from somewhere inside Nemerle.Compiler.Macros.quoted_expr, but I'm not sure, if it doesn't go somewhere deeper. Add --trace fixes the problem, so it is not helpful. Steps to reproduce the problem: 1. untar the attached tar.gz file 2. ./run.sh Actual Results: ... start typing expr { done typing expr } start typing expr { start typing expr { Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object Expected Results: lots more start typing { and done typing }. No null ref. How often does this happen? Always. Additional Information: Please contact me, if you need some more info. ---- Additional Comments From malekith@pld-linux.org 2005-05-15 07:57:51 MST ---- Created an attachment (id=167946) testcase ---- Additional Comments From vargaz@gmail.com 2005-05-15 10:34:05 MST ---- I can repro this under 1.1.7, but not under HEAD. So its either a bug which was fixed after 1.17, or its still there, but does not show up. ---- Additional Comments From sebastien@ximian.com 2005-05-15 11:55:35 MST ---- Zoltan, could it be related to #74952 ? which still happens after 1.1.7 but that I can't duplicate on x86. ---- Additional Comments From malekith@pld-linux.org 2005-05-15 14:29:33 MST ---- I don't think so. I don't even get a stack trace. If it does not occur with svn head, I'm happy. My problem was solved by just changing the order of files passed to the compiler... I hope it won't appear again. ---- Additional Comments From malekith@pld-linux.org 2005-05-25 15:45:21 MST ---- The problems is still there :/ I'm currently using 1.1.7.99.20050522 and it appears again. To reproduce please untar the testcase and ./run It should print progress bar and finish clearly (it does on x86), but it fails with null ref. A side note: I had similar issue with 1.1.7, that I tracked down better. It seemed like if a object had first some field set and then nulled out (but my program couldn't have done this, there was no stfld that could do this). So maybe this is GC related (and that's why it is so sensitive to small changes). Anyway this gone away when I moved to 1.1.7.99.20050522. ---- Additional Comments From malekith@pld-linux.org 2005-05-25 15:46:05 MST ---- Created an attachment (id=167947) testcase, tar.gz ---- Additional Comments From malekith@pld-linux.org 2005-05-25 16:25:53 MST ---- I have found a smaller example of sources passed to nemerle compiler it happens on, but I guess it's the compiler being huge what matters to you, not the sources. If you want it though, please say so. ---- Additional Comments From vargaz@gmail.com 2005-05-26 09:30:17 MST ---- Could this be a dup of: https://bugzilla.novell.com/show_bug.cgi?id=MONO75042 Same nemerle stuff, same randomness due to uninitialized memory ? ---- Additional Comments From malekith@pld-linux.org 2005-05-26 10:21:42 MST ---- I don't think so. There is no instance of parameterless struct ctor in the compiler. There is just a single instance of ctor that doesn't initialize field, but this field is never used. ---- Additional Comments From bmaurer@users.sf.net 2005-07-02 18:00:40 MST ---- can you please attach gdb and get a stack trace from this? ---- Additional Comments From malekith@pld-linux.org 2005-07-02 18:08:08 MST ---- I've had this issue several times, but it appeared as a memory corruption (a field in an object was null, after it was set). Therefore I don't think stack trace will be valuable here :/ Recently USE_MMAP was enabled on amd64 which appears to make this bug show up more often (I wasn't able to compile anything using nemerle compiler that used to work). I disabled it in my local mono tree. I'll get back to this issue, after all that generic stuff works... ---- Additional Comments From malekith@pld-linux.org 2005-08-06 07:03:02 MST ---- I guess I found it. It seems to be mono_arch_nullify_class_init_trampoline. The condition: if ((code [0] == 0x49) && (code [1] == 0xff)) { seems to happen by accident (the offset of callq is 0x49 0xff). code [-2] == 0xe8 is also true. I exchanged the conditions, like: Index: tramp-amd64.c =================================================================== --- tramp-amd64.c (revision 47928) +++ tramp-amd64.c (working copy) @@ -83,7 +83,15 @@ { code -= 3; - if ((code [0] == 0x49) && (code [1] == 0xff)) { + if (code [-2] == 0xe8) { + guint8 *buf = code - 2; + + buf [0] = 0x66; + buf [1] = 0x66; + buf [2] = 0x90; + buf [3] = 0x66; + buf [4] = 0x90; + } else if ((code [0] == 0x49) && (code [1] == 0xff)) { /* amd64_set_reg_template is 10 bytes long */ guint8* buf = code - 10; @@ -102,14 +110,6 @@ buf [10] = 0x90; buf [11] = 0x66; buf [12] = 0x90; - } else if (code [-2] == 0xe8) { - guint8 *buf = code - 2; - - buf [0] = 0x66; - buf [1] = 0x66; - buf [2] = 0x90; - buf [3] = 0x66; - buf [4] = 0x90; } else if (code [0] == 0x90 || code [0] == 0xeb || code [0] == 0x66) /* Already changed by another thread */ ; and it seems to cure the problem. However I'm not sure if it won't break anything else. One way or another this code is wrong, just fails when offset happen to use the same byte pattern. Probably other code paths in this function should also be inspected. ---- Additional Comments From vargaz@gmail.com 2005-08-07 19:04:37 MST ---- Hopefully fixed in SVN. Thanks for tracking this down. Imported an attachment (id=167946) Imported an attachment (id=167947) Unknown bug field "cf_op_sys_details" encountered while moving bug <cf_op_sys_details>amd64 pld linux</cf_op_sys_details> Unknown operating system unknown. Setting to default OS "Other".