Bug 1077848

Summary: No GUI with AMDGPU after Kernel Update
Product: [openSUSE] openSUSE Distribution Reporter: Forgotten User RieEZfasM7 <forgotten_RieEZfasM7>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: daniel, forgotten_RieEZfasM7, tiwai
Version: Leap 42.3   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 42.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Forgotten User RieEZfasM7 2018-01-27 13:06:52 UTC
I run an AMD A8-9600 R7 IGP (Carizzo) using amdgpu. System has been running smoothly for a long time.
After a kernel upgrade:
- When booting, most of the time, after I see the first few lines of boot screen logging, the display briefly flashes and then turns off, with the monitor going to standby. Then the monitor comes back on, but stays black. Also, switching to other terminals (Ctrl-Alt-F1) does nothing.
- About one out of 10 times, the system comes up to KDE properly, but no accelerated graphics like desktop effects, HW video decoding etc.
- I can reliably boot in recovery and then startx, but that obviously does not have accelerated graphics, either.
- Behavior is the same no matter which version kernel I boot - in particular also the kernel that had been working fine before has the above issues now. 

When I run mkinitrd, I get
[code]
dracut: Possible missing firmware "amdgpu/stoney_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_mec2.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_sdma1.bin" for kernel module "amdgpu.ko"
[/code]

But these files are clearly present:
[code]
# ll /lib/firmware/amdgpu
total 9636
-rw-r--r-- 1 root root   8832 Mai 30  2017 carrizo_ce.bin
-rw-r--r-- 1 root root  17024 Mai 30  2017 carrizo_me.bin
-rw-r--r-- 2 root root 262784 Mai 30  2017 carrizo_mec2.bin
-rw-r--r-- 2 root root 262784 Mai 30  2017 carrizo_mec.bin
-rw-r--r-- 1 root root  17024 Mai 30  2017 carrizo_pfp.bin
-rw-r--r-- 1 root root  18932 Mai 30  2017 carrizo_rlc.bin
-rw-r--r-- 2 root root  10624 Mai 30  2017 carrizo_sdma1.bin
-rw-r--r-- 2 root root  10624 Mai 30  2017 carrizo_sdma.bin
-rw-r--r-- 1 root root 268000 Mai 30  2017 carrizo_uvd.bin
-rw-r--r-- 1 root root 175840 Mai 30  2017 carrizo_vce.bin
-rw-r--r-- 2 root root   8832 Mai 30  2017 fiji_ce.bin
[/code]

More info:

[code]
# cat /proc/cmdline                                                                
BOOT_IMAGE=/boot/vmlinuz-4.4.104-39-default root=UUID=68f92734-8c00-4cb5-9800-ca7e17df9309 resume=/dev/disk/by-id/ata-Samsung_SSD_840_Series_S19HNEBD343680A-part2 splash=nosplash quiet showopts
[/code]

So no "vga=" or anything.

[code]
# lsmod | grep -i amd
edac_mce_amd           28672  0 
amdkfd                139264  1 
amd_iommu_v2           20480  1 amdkfd
amdgpu                679936  0 
i2c_algo_bit           16384  1 amdgpu
drm_kms_helper        155648  1 amdgpu
ttm                   106496  1 amdgpu
drm                   393216  3 ttm,drm_kms_helper,amdgpu
[/code]

[code]
# lspci -nnk | grep -A3 VGA
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Carrizo [1002:9874] (rev e2)
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1e20]
        Kernel modules: amdgpu
00:01.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio [1002:9840]
[/code]

So it seems the amdgpu driver is loaded and used.

[code]
# glxgears
[/code]

shows gears but runs a bit choppy.

I also described this here: https://forums.opensuse.org/showthread.php/529291-AMDGPU-broken-after-Kernel-Upgrade?p=2852630#post2852630

And it has been pointed out that this might be related to https://bugzilla.opensuse.org/show_bug.cgi?id=1072431
Comment 1 Forgotten User RieEZfasM7 2018-01-27 13:16:07 UTC
Ah, forgot to mention, my system is fully up-to-date to the opensuse-updates repo, Kernel is 4.4.104-39-default.
Comment 2 Forgotten User RieEZfasM7 2018-01-28 16:52:10 UTC
I came across this: https://forums.opensuse.org/showthread.php/528958-amd-driver-not-loaded-during-startup

Turns out I also had some conf files in /etc/dracut.conf.d/:
[code]
# ll /etc/dracut.conf.d/
total 20
-rw-r--r-- 1 root root   22 Dez 22 14:13 02-early-microcode.conf
-rw-r--r-- 1 root root  487 Dez 22 14:13 99-debug.conf
-rw-r--r-- 1 root root   96 Nov 12 19:06 amdgpu-4.13.0-2.g7e9e30a-default.conf
-rw-r--r-- 1 root root   88 Sep  9 19:03 amdgpu-pro-4.4.85-22-default.conf
drwxr-xr-x 3 root root 4096 Aug 11 21:44 modules.d
[/code]

After I deleted them, a new run of mkinitrd did not yield any warnings and all problems were fixed...

Not sure what this means for this bug - it could either be closed, or used to investigate why these harmful conf files appear...
Comment 3 Takashi Iwai 2018-02-14 11:48:32 UTC
It's likely a known issue with admgpu-pro package that provides the bogus dracut config.

*** This bug has been marked as a duplicate of bug 1066682 ***