Bug 588143

Summary: if LD_LIBRARY_PATH is empty, DllNotFoundException is only thrown if the native library in current directory is specified with a ".so" suffix in the DllImport or in the .config.exe file
Product: [Mono] Mono: Runtime Reporter: Forgotten User oDRaEXi7Ku <forgotten_oDRaEXi7Ku>
Component: interopAssignee: Forgotten User oDRaEXi7Ku <forgotten_oDRaEXi7Ku>
Status: RESOLVED FIXED QA Contact: Mono Bugs <mono-bugs>
Severity: Major    
Priority: P5 - None CC: forgotten_vxPDddArjq
Version: 2.6.x   
Target Milestone: ---   
Hardware: i686   
OS: openSUSE 11.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 610009    
Attachments: ZIP that contains all the binaries of the testcase
Zip that contains new version of Banshee.Mirage.dll that has a workaround
zip that contains the sources of the testcase (3 managed projects, 1 unmanaged)
Thanks for your prompt reply Zoltan, here's the log.
MONO_LOG_LEVEL=debug output
MONO_LOG_LEVEL=info output
Zip file that contains the sources of the more simplified testcase
Patch v1
dllmap-fix.diff

Description Forgotten User oDRaEXi7Ku 2010-03-13 03:39:52 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; es-ES; rv:1.9.1.8) Gecko/20100204 SUSE/3.5.8-0.1.1 Firefox/3.5.8

I am using Mono 2.6.1 in 32bits. The testcase is very tricky and I'll attach it right now.

Reproducible: Always

Steps to Reproduce:
1.Compile the testcase.
2.Run mono cons.exe.

Actual Results:  
DllNotFoundException.

Expected Results:  
Should work.

Workaround: if you compile Banshee.Mirage.dll with ENABLE_WORKAROUND defined, it will work.
Comment 1 Forgotten User oDRaEXi7Ku 2010-03-13 03:46:02 UTC
Created attachment 348301 [details]
ZIP that contains all the binaries of the testcase

Just cd into the dir and "mono cons.exe" to reproduce.
Comment 2 Forgotten User oDRaEXi7Ku 2010-03-13 03:49:13 UTC
Created attachment 348302 [details]
Zip that contains new version of Banshee.Mirage.dll that has a workaround

Just replace old dll with this one, and testcase starts working.
Comment 3 Forgotten User oDRaEXi7Ku 2010-03-13 03:53:53 UTC
Created attachment 348303 [details]
zip that contains the sources of the testcase (3 managed projects, 1 unmanaged)
Comment 4 Forgotten User vxPDddArjq 2010-03-13 09:48:27 UTC
I can't reproduce this, could you run mono with MONO_LOG_LEVEL=debug and attach the output ?
Comment 5 Forgotten User oDRaEXi7Ku 2010-03-13 13:07:49 UTC
Created attachment 348317 [details]
Thanks for your prompt reply Zoltan, here's the log.

out.txt (contains normal and err output)
Comment 6 Forgotten User oDRaEXi7Ku 2010-03-13 13:11:53 UTC
Created attachment 348318 [details]
MONO_LOG_LEVEL=debug output

Rather use this one, it's run with LANG=en
Comment 7 Forgotten User vxPDddArjq 2010-03-13 19:55:58 UTC
Are you sure the .so file is in your LD_LIBRARY_PATH ?
Comment 8 Forgotten User oDRaEXi7Ku 2010-03-14 06:14:42 UTC
(In reply to comment #7)
> Are you sure the .so file is in your LD_LIBRARY_PATH ?

No. But:

1. The .so is in the same directory as the managed libraries and I run mono in the same directory.
2. If the above reason is not enough, why it doesn't throw DllNotFoundException with the workaround applied (note that, with the workaround, it's Banshee.Mirage.dll the one which does the first dllimport call, not Mirage.dll).

So, either it shouldn't throw the exception or it should throw it in both cases, for consistency.
Comment 9 Forgotten User vxPDddArjq 2010-03-14 18:33:09 UTC
No idea why it works in that case, but I doubt this is a mono problem, we simply pass the name to dlopen() and doesn't pass any directory name or such.
Comment 10 Forgotten User oDRaEXi7Ku 2010-03-23 23:24:37 UTC
Created attachment 350140 [details]
MONO_LOG_LEVEL=info output

(In reply to comment #9)
> No idea why it works in that case, but I doubt this is a mono problem, we
> simply pass the name to dlopen() and doesn't pass any directory name or such.

It seems you don't pass any directory names, but you *do* try to load with the current "./" directory in some cases (that's why it works with the workaround).

I'm attaching now the output of running with this command:
LANG=en MONO_LOG_LEVEL="info" MONO_LOG_MASK="dll" mono cons.exe  > logerr.log 2>&1

So you can see that the fallback to run the dlopen with the current directory is only done when no .config file is found it seems.
Comment 11 Forgotten User oDRaEXi7Ku 2010-03-24 00:27:54 UTC
Changing summary due to latest findings.

Smaller and more comprensible testcase coming...
Comment 12 Forgotten User oDRaEXi7Ku 2010-03-24 00:43:29 UTC
Created attachment 350145 [details]
Zip file that contains the sources of the more simplified testcase

The testcase contains 2 projects, 1 unmanaged and 1 managed.

To test, compile the unmanaged and copy its .so in the directory where you will place the binaries of the managed one.

Compile the managed one, place the config.xml in the same directory where the binaries are.

Now, place yourself in that directory, and run:

1. "mono DlOpenIssue.exe 1" -> the test that works (without ".so" suffix in the DllImport.
2. "mono DlOpenIssue.exe 2" -> DllNotFoundException (same case as before, but the filename for the DllImport clause contains the ".so" suffix).
3. "mono DlOpenIssue.exe 3" -> DllNotFoundException (same case as 1st case, but a DlOpenIssue.exe.config is present that maps the plain filename of the lib to a one that contains the ".so" suffix)

Be sure to test all cases with an empty LD_LIBRARY_PATH.
Comment 13 Forgotten User oDRaEXi7Ku 2010-03-24 01:36:12 UTC
It seems the culprit is in mono_dl_build_path(...) function. Zoltan, would you accept a patch that fixes this function to meet this? Thanks.
Comment 14 Forgotten User oDRaEXi7Ku 2010-03-24 02:38:41 UTC
Created attachment 350152 [details]
Patch v1

Ok, the problem seems indeed easy to spot by just looking at the function, because the suffix variable is assigned but never used (so it seems a typo), thus I believe it could have been catched by some gcc warning.
Comment 15 Forgotten User vxPDddArjq 2010-03-24 14:26:13 UTC
Looks ok, please commit.
Comment 16 Forgotten User oDRaEXi7Ku 2010-03-24 14:58:09 UTC
Cool thanks, committed in r154147
Comment 17 Forgotten User oDRaEXi7Ku 2010-03-25 02:26:34 UTC
Zoltan, can I backport this to any branch? I confirm it fixed the problem for me.
Comment 18 Forgotten User vxPDddArjq 2010-03-25 02:30:15 UTC
Yes, altough it might break things.
Comment 19 Marek Habersack 2010-05-28 22:02:56 UTC
Created attachment 365647 [details]
dllmap-fix.diff

The above attachment contains the correct fix for this bug. The original one in comment #14 was invalid (it caused bug #610009). Please review and let me know if it's OK to commit.
Comment 20 Marek Habersack 2010-05-28 22:03:34 UTC
Reopening with a new proposed fix.
Comment 21 Forgotten User oDRaEXi7Ku 2010-06-01 21:58:14 UTC
(In reply to comment #19)
> Created an attachment (id=365647) [details]
> dllmap-fix.diff
> 
> The above attachment contains the correct fix for this bug. The original one in
> comment #14 was invalid (it caused bug #610009). Please review and let me know
> if it's OK to commit.

Without looking deeply at the patch and by just having a brief look on the linked bug, let me state my opinion around what I also learnt while debugging and proposing the first patch:

It seems that banshee's .config file is kind of wrong, because the target attribute contains a platform detail while it lacks another one (that is, it includes the prefix "lib", but lacks the ".so" suffix), so mono assumes that, by including a platform detail, it doesn't need to "guess" more platform details. This goes on par with what the original code actually did (but somehow it didn't because of the typo I found).

So, according to that argument above, a more easier and simpler fix would be:

-  <dllmap dll="libbanshee.dll" target="libbanshee" os="!windows"/>
+  <dllmap dll="libbanshee.dll" target="libbanshee.so" os="!windows"/>

Which actually seems prettier and gives more information :)

Other argument against the patch is that it makes code more complex and thus a bit less performant, but I guess this is a tiny factor.

Anyway, I'll see if I can have a deeper look at the patch this weekend to see if I really understand the idea behind it. In the meantime, getting Zoltan's feedback would be great too :)
Comment 22 Forgotten User vxPDddArjq 2010-06-04 14:05:41 UTC
The patch could use some comments around the
if (first_call || ...)
line.
Also, idx is never incremented, because the increment is inside a:
if (first_call)
so it will always stay 0.
Comment 23 Marek Habersack 2010-06-04 19:26:53 UTC
Fixed in r158490