Bug 1022156

Summary: radeon [R9 380X] vulkaninfo ends with segmentation fault
Product: [openSUSE] openSUSE Tumbleweed Reporter: Petr Cervinka <petr>
Component: X.OrgAssignee: Michal Srb <msrb>
Status: RESOLVED FIXED QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Normal    
Priority: P3 - Medium CC: antoine.belvire, forgotten_75I7EmJG8s, hwuelpern, jengelh, jlp, jon, linus.kardell, msrb, osukup, thiago, vpelcak
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: vulkaninfo core dump
core dump vulkaninfo version 1.0.39
coredump intel

Description Petr Cervinka 2017-01-26 21:02:18 UTC
I wanted to try vulkan on latest Tumbleweed snapshot, but vulkaninfo ends with segmentation fault. Hw is Radeon r9 380x using standard amdgpu driver.

>vulkaninfo 
===========
VULKAN INFO
===========

Vulkan API Version: 1.0.32

INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_core_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_image.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_object_tracker.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_parameter_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_swapchain.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_threading.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_unique_objects.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_core_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_image.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_object_tracker.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_parameter_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_swapchain.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_threading.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_unique_objects.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /home/petr/.local/share/vulkan/implicit_layer.d/steamoverlay_i386.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /home/petr/.local/share/vulkan/implicit_layer.d/steamoverlay_x86_64.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/icd.d/radeon_icd.x86_64.json, version "1.0.0"
Segmentation fault (core dumped)
Comment 1 Petr Cervinka 2017-01-26 21:03:03 UTC
GNU gdb (GDB; openSUSE Tumbleweed) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.opensuse.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vulkaninfo...Reading symbols from /usr/lib/debug/usr/bin/vulkaninfo.debug...done.
done.
[New LWP 12714]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `vulkaninfo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000001c266 in ?? ()
(gdb) bt
#0  0x000000000001c266 in ?? ()
#1  0x00007f2569f6cfa2 in radv_lookup_entrypoint (name=<optimized out>) at radv_entrypoints.c:857
#2  0x00007f257251da3d in loader_scanned_icd_add (api_version=4194307, filename=0x7ffed7d64720 "/usr/lib64/libvulkan_radeon.so", icd_libs=0xffcd70, inst=0xffcd00)
    at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.32.g28/loader/loader.c:1542
#3  loader_icd_scan (inst=inst@entry=0xffcd00, icds=icds@entry=0xffcd70) at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.32.g28/loader/loader.c:3045
#4  0x00007f25725216db in vkCreateInstance (pCreateInfo=pCreateInfo@entry=0x7ffed7d64cc0, pAllocator=pAllocator@entry=0x0, pInstance=pInstance@entry=0x7ffed7d64d10)
    at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.32.g28/loader/trampoline.c:367
#5  0x0000000000401968 in app_create_instance (inst=0x7ffed7d64d10) at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.32.g28/demos/vulkaninfo.c:678
#6  main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.32.g28/demos/vulkaninfo.c:1428
(gdb)
Comment 2 Petr Cervinka 2017-01-26 21:05:18 UTC
Created attachment 711833 [details]
vulkaninfo core dump
Comment 3 Henning W 2017-01-27 23:53:20 UTC
This happens on a GCN 1.0 (HD7850) with the experimental AMDGPU Kernel module and blacklisting the radeon driver on xorg as well. So I would assume this is a bug for all GCN GPUs.
Comment 4 Jure Repinc 2017-01-30 19:23:23 UTC
Also happens here with Radeon RX480 and amdgpu drivers. I reported the bug to Mesa developers (https://bugs.freedesktop.org/show_bug.cgi?id=99591) and they said I should try using newer version of vulkaninfo which means vulkan package would need to be updated.
Comment 5 Jan Engelhardt 2017-01-30 19:33:42 UTC
This seems to be related to Mesa-vulkan not KHR-vulkan.
Comment 6 Jan Engelhardt 2017-01-30 19:36:43 UTC
Well that dump was confusing.
Comment 7 Jan Engelhardt 2017-01-30 19:39:31 UTC
I do not have any AMD GPU hardware around to test, and the Intel driver seems to fly. It has been known before that mixing e.g. system libGL and NVIDIA libGL can lead to problems. Are you by chance using any part of a proprietary driver?
Comment 8 Jure Repinc 2017-01-30 19:57:39 UTC
No, I am using only free drivers, no amdgpu-pro here
Comment 9 Petr Cervinka 2017-01-30 20:37:30 UTC
(In reply to Jan Engelhardt from comment #7)
> I do not have any AMD GPU hardware around to test, and the Intel driver
> seems to fly. It has been known before that mixing e.g. system libGL and
> NVIDIA libGL can lead to problems. Are you by chance using any part of a
> proprietary driver?

No, only standard (no proprietary) amdgpu driver shipped with Tumbleweed.
Comment 10 Jan Engelhardt 2017-02-05 00:47:01 UTC
Newer vulkan-1.0.39.1 is available in X11:Wayland if you want to try.
Comment 11 Petr Cervinka 2017-02-05 18:47:07 UTC
(In reply to Jan Engelhardt from comment #10)
> Newer vulkan-1.0.39.1 is available in X11:Wayland if you want to try.

Unfortunately, same core dump after upgrade to 1.0.39.1:
===========
VULKAN INFO
===========

Vulkan API Version: 1.0.39

INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_core_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_image.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_object_tracker.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_parameter_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_swapchain.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_threading.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /etc/vulkan/explicit_layer.d/VkLayer_unique_objects.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_core_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_image.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_object_tracker.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_parameter_validation.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_swapchain.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_threading.json, version "1.0.0"
INFO: [loader] Code 0 : Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_unique_objects.json, version "1.0.0"
INFO: [loader] Code 0 : Found ICD manifest file /usr/share/vulkan/icd.d/radeon_icd.x86_64.json, version "1.0.0"
Segmentation fault (core dumped)

Reading symbols from vulkaninfo...Reading symbols from /usr/lib/debug/usr/bin/vulkaninfo.debug...done.
done.
[New LWP 3616]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `vulkaninfo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000001c266 in ?? ()
(gdb) bt
#0  0x000000000001c266 in ?? ()
#1  0x00007fe83d00cfa2 in radv_lookup_entrypoint (name=<optimized out>) at radv_entrypoints.c:857
#2  0x00007fe8455bf346 in loader_scanned_icd_add (api_version=4194307, filename=0x7ffd38113840 "/usr/lib64/libvulkan_radeon.so", icd_tramp_list=0xf2aa50, inst=0xf2aa10)
    at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.39.1/loader/loader.c:1765
#3  loader_icd_scan (inst=inst@entry=0xf2aa10, icd_tramp_list=icd_tramp_list@entry=0xf2aa50) at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.39.1/loader/loader.c:3430
#4  0x00007fe8455c4e9e in vkCreateInstance (pCreateInfo=pCreateInfo@entry=0x7ffd38113df0, pAllocator=pAllocator@entry=0x0, pInstance=pInstance@entry=0x7ffd38113e40)
    at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.39.1/loader/trampoline.c:368
#5  0x0000000000401797 in AppCreateInstance (inst=0x7ffd38113e40) at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.39.1/demos/vulkaninfo.c:695
#6  main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/Vulkan-LoaderAndValidationLayers-1.0.39.1/demos/vulkaninfo.c:1476
Comment 12 Petr Cervinka 2017-02-05 18:48:26 UTC
Created attachment 712924 [details]
core dump vulkaninfo version 1.0.39
Comment 13 Petr Cervinka 2017-02-05 18:56:47 UTC
I did a short update on upstream bugzilla too  https://bugs.freedesktop.org/show_bug.cgi?id=99591 .
Comment 14 Forgotten User 75I7EmJG8s 2017-06-10 22:34:57 UTC
It seems that, in my testing using an RX 460, forkbomb's Mesa repo (https://build.opensuse.org/project/show/home:forkbomb:turboAMD-stable) is not affected by this bug.

The question is then, what is the difference between Factory's Mesa/LLVM and Forkbomb's Mesa/LLVM?
Comment 15 Forgotten User 75I7EmJG8s 2017-06-10 22:36:56 UTC
*** Bug 1041684 has been marked as a duplicate of this bug. ***
Comment 16 Stefan Dirsch 2017-06-11 08:38:58 UTC
(In reply to Keith Hizal from comment #14)
> It seems that, in my testing using an RX 460, forkbomb's Mesa repo
> (https://build.opensuse.org/project/show/home:forkbomb:turboAMD-stable) is
> not affected by this bug.
> 
> The question is then, what is the difference between Factory's Mesa/LLVM and
> Forkbomb's Mesa/LLVM?

Completely different Mesa/llvm versions? I'm not maintaining this Forkbomb's repo, so I can't say ...
Comment 17 Forgotten User 75I7EmJG8s 2017-06-11 08:50:21 UTC
As(In reply to Stefan Dirsch from comment #16)
> (In reply to Keith Hizal from comment #14)
> > It seems that, in my testing using an RX 460, forkbomb's Mesa repo
> > (https://build.opensuse.org/project/show/home:forkbomb:turboAMD-stable) is
> > not affected by this bug.
> > 
> > The question is then, what is the difference between Factory's Mesa/LLVM and
> > Forkbomb's Mesa/LLVM?
> 
> Completely different Mesa/llvm versions? I'm not maintaining this Forkbomb's
> repo, so I can't say ...

LLVM5 is actually a misnomer as explained here: https://build.opensuse.org/package/show/home:forkbomb:turboAMD-stable/llvm5

Also, even when Forkbomb and Factory were on the same version of Mesa, only Forkbomb's RADV-Vulkan worked. 

Upstream has suggested that it may have to deal with LTO: https://bugs.freedesktop.org/show_bug.cgi?id=99591#c6 .
Comment 18 Michal Srb 2017-08-02 09:08:38 UTC
*** Bug 1051767 has been marked as a duplicate of this bug. ***
Comment 19 Michal Srb 2017-08-02 09:14:50 UTC
I was able to reproduce this. The vulkan loader loads and unloads the /usr/lib64/libvulkan_radeon.so multiple times. The first time it succeeds, second time it fails to load, third time it loads but the global symbols are not initialized causing it to crash when attempting to call strcmp.

The /usr/lib64/libvulkan_radeon.so library loads multiple LLVM libraries. I think that they are confusing the dynamic loader the same way they did in bug 981975.
Comment 20 Vit Pelcak 2017-08-02 09:21:01 UTC
Created attachment 734890 [details]
coredump intel

The same problem happens on Intel.

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
Comment 21 Michal Srb 2017-08-02 09:26:25 UTC
(In reply to Vit Pelcak from comment #20)
> The same problem happens on Intel.

Still crashing in libvulkan_radeon.so. (The vulkan loader examines all available layers and ICDs. It is ok that it loads libvulkan_radeon.so even on Intel.)
Comment 22 Michal Srb 2017-08-02 11:25:30 UTC
It seems that the cause really is the same as in bug 981975 - circular dependencies between LLVM libraries cause trouble during library unloading. Bug 981975 was solved by adding extra dependency to the driver libraries that keeps libLLVMCodeGen loaded longer. Similar thing can be done for libvulkan_radeon.so.

Following command can be used to test the workaround:
patchelf --add-needed libLLVMCodeGen.so.4 libvulkan_radeon.so

After this the vulkaninfo command doesn't crash anymore.
Comment 23 Thiago Macieira 2017-08-02 18:06:57 UTC
From https://bugs.freedesktop.org/show_bug.cgi?id=102010#c3:
> BTW, looks like SUSE is building LLVM with BUILD_SHARED_LIBS=ON, which is a bad
> idea. They should build it with LLVM_BUILD_LLVM_DYLIB=ON instead. Please pass
> this on to them. (This problem might happen regardless though)
Comment 24 Petr Cervinka 2017-08-03 06:30:00 UTC
(In reply to Michal Srb from comment #22)
> Following command can be used to test the workaround:
> patchelf --add-needed libLLVMCodeGen.so.4 libvulkan_radeon.so
> 
> After this the vulkaninfo command doesn't crash anymore.

I can confirm that vulkaninfo doesn't crash (on R9 380X) with this recommendation.
Comment 25 Michal Srb 2017-08-03 08:21:17 UTC
(In reply to Thiago Macieira from comment #23)
> From https://bugs.freedesktop.org/show_bug.cgi?id=102010#c3:
> > BTW, looks like SUSE is building LLVM with BUILD_SHARED_LIBS=ON, which is a bad
> > idea. They should build it with LLVM_BUILD_LLVM_DYLIB=ON instead. Please pass
> > this on to them. (This problem might happen regardless though)

Indeed. I have learned about this just yesterday. Working on it in bug 1049703. It will hopefully fix multiple bugs, including this one.
Comment 26 Forgotten User 75I7EmJG8s 2017-08-04 16:48:56 UTC
Forkbomb's LLVM that I mentioned earlier also builds with LLVM_BUILD_LLVM_DYLIB=ON, so that might be the reason why it has worked for me.
Comment 27 Linus Kardell 2017-08-18 16:29:26 UTC
I too am getting segfaults when trying to use Vulkan on Tumbleweed, both with Intel and with proprietary Nvidia. It worked on Leap though (at least with Nvidia).
Comment 28 Linus Kardell 2017-08-18 16:35:10 UTC
(In reply to Linus Kardell from comment #27)
> I too am getting segfaults when trying to use Vulkan on Tumbleweed, both
> with Intel and with proprietary Nvidia. It worked on Leap though (at least
> with Nvidia).

Nevermind. Seems it happens just due to having Radeon Vulkan installed, even though there is no Radeon card.
Comment 29 Michal Srb 2017-10-20 08:20:30 UTC
LLVM (4) built with LLVM_BUILD_LLVM_DYLIB instead BUILD_SHARED_LIBS finally passed all tests and got into Factory.

That change fixes this bug as well.