Bug 1134475

Summary: X11 not starting - Tumbleweed - After Kernel update (5.0.9 +) - radeon
Product: [openSUSE] openSUSE Tumbleweed Reporter: James Roulston <james.roulston.1>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P2 - High CC: anatoli.antonovitch, fabian.baumanis, james.roulston.1, msvec, tiwai, tzimmermann
Version: Current   
Target Milestone: Current   
Hardware: x86-64   
OS: SUSE Other   
Whiteboard:
Found By: Community User Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: mkinitrd output
Patch for the amdgpu-dkms specfile.

Description James Roulston 2019-05-08 15:31:37 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0
Build Identifier: 

After updating Tumbleweed to latest kernel 5.0.9 (5.0.11 Current) the system would freeze during boot although there was still hard drive activity.
I could successfully boot using 'Nomodeset' in the kernel parameter. 
I could startx but in some fallback mode so quite slow.
I could load in the correct driver by doing 'modprobe -v radeon modeset=1'.
this started X correctly. It seems to be loading 'radeonfb' instead of 'radeon'
I fix for this will be in the additional info area.

Video details below

VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Richland [Radeon HD 8470D] [1002:9996]
        Subsystem: Hewlett-Packard Company Device [103c:2b60]
        Kernel driver in use: radeon
        Kernel modules: radeon

Reproducible: Always

Steps to Reproduce:
1. Update system using zypper dup with new kernel
2. reboot
3.
Actual Results:  
The screen hangs with no X display, the keyboard shortcuts ctrl+alt+Fn keys don't work but ctrl+alt+del reboots.

Expected Results:  
X window manager should start.

I fixed the issue by doing the following.
On the terminal i go into /etc/modprobe.d
I created a file called 'blacklist.radeon.conf'
in that file I added the line 'blacklist radeon'
I then ran 'mkinitrd' 
when I rebooted the issue was still the same so I changed the line to 'blacklist radeonfb' but did not run mkinitrd.
I rebooted and everything works again.

I have to re-edit the  'blacklist.radeon.conf' file with 'blacklist radeon' then mkinitrd and then re-edit that file to 'blacklist radeonfb'

I have to repeat these steps on every Kernel Update since 5.0.9.
Comment 1 Stefan Dirsch 2019-05-08 16:09:36 UTC
The description is somewhat confusing. Not sure which kernel module is supposed to be the culprit here. radeon or radeonfb? AFAIK radeonfb build is completely disabled ...

Could you explain in more detail, please?
Comment 2 James Roulston 2019-05-08 16:38:17 UTC
When booting with 'Splash=0' I get the terminal message 'fb0: switching to radeondrmfb to VESA VGA.'

I saw in the blacklist file '50-blacklist.conf' that radeonfb is indeed disabled.
I think it's the radeon module not loading by default when it should.
When booting with nomodest enabled I do get to a login terminal and running 'modprobe -v radeon modeset=1' gets things up and going again.

radeondrmfb could be loading by default instead of radeon.
Comment 3 Stefan Dirsch 2019-05-14 09:16:05 UTC
radeondrmfb is the fb (=framebuffer) *implementation* of radeon's DRM (=direct rendering manager/kernel mode setting) driver/kernel module 'radeon'. 'radeonfb' is an ancient separate framebuffer driver/kernel module for radeon cards. We never built/shipped, let alone enabled them by default. This should explains things a bit.

So my conclusion would be, that perhaps radeon driver is not initialized yet completely when desktop is starting.
Comment 4 James Roulston 2019-05-14 15:20:17 UTC
Created attachment 805046 [details]
mkinitrd output
Comment 5 Takashi Iwai 2019-05-14 15:24:04 UTC
Did you install kernel-firmware package?

The error message from dracut indicates the missing firmware files, and that can be the cause of the graphics problem at boot.
Comment 6 James Roulston 2019-05-14 15:44:45 UTC
Kernel Firmware is installed, version 20190502-1.1 according to yast.
Below is a snippet of some of the files under the Dependencies tab in YaST Software management.

firmware(r8a779x_usb3_v1.dlmem)
firmware(r8a779x_usb3_v2.dlmem)
firmware(r8a779x_usb3_v3.dlmem)
firmware(radeon/ARUBA_me.bin)
firmware(radeon/ARUBA_pfp.bin)
firmware(radeon/ARUBA_rlc.bin)
firmware(radeon/BARTS_mc.bin)
firmware(radeon/BARTS_me.bin)
firmware(radeon/BARTS_pfp.bin)
firmware(radeon/BARTS_smc.bin)
firmware(radeon/BONAIRE_ce.bin)
Comment 7 James Roulston 2019-05-14 15:45:47 UTC
My card is Radeon HD 8470D ARUBA
Comment 8 Takashi Iwai 2019-05-14 15:48:24 UTC
And do you see the files /lib/firmare/xxx on your system for the files that are reported by dracut?  For example, /lib/firmware/radeon/R520_cp.bin is present on your system?

If not, check whether you really have kernel-firmware package installed.
Comment 9 James Roulston 2019-05-14 16:02:41 UTC
This is the listing of everything in /lib/firmware/radeon/

ARUBA_me.bin
ARUBA_pfp.bin
ARUBA_rlc.bin
banks_k_2_smc.bin
BARTS_mc.bin
BARTS_me.bin
BARTS_pfp.bin
BARTS_smc.bin
bonaire_ce.bin
BONAIRE_ce.bin
bonaire_k_smc.bin
BONAIRE_mc2.bin
bonaire_mc.bin
BONAIRE_mc.bin
bonaire_me.bin
BONAIRE_me.bin
bonaire_mec.bin
BONAIRE_mec.bin
bonaire_pfp.bin
BONAIRE_pfp.bin
bonaire_rlc.bin
BONAIRE_rlc.bin
bonaire_sdma1.bin
bonaire_sdma.bin
BONAIRE_sdma.bin
bonaire_smc.bin
BONAIRE_smc.bin
bonaire_uvd.bin
BONAIRE_uvd.bin
bonaire_vce.bin
BONAIRE_vce.bin
BTC_rlc.bin
CAICOS_mc.bin
CAICOS_me.bin
CAICOS_pfp.bin
CAICOS_smc.bin
CAYMAN_mc.bin
CAYMAN_me.bin
CAYMAN_pfp.bin
CAYMAN_rlc.bin
CAYMAN_smc.bin
CEDAR_me.bin
CEDAR_pfp.bin
CEDAR_rlc.bin
CEDAR_smc.bin
CYPRESS_me.bin
CYPRESS_pfp.bin
CYPRESS_rlc.bin
CYPRESS_smc.bin
CYPRESS_uvd.bin
hainan_ce.bin
HAINAN_ce.bin
hainan_k_smc.bin
HAINAN_mc2.bin
hainan_mc.bin
HAINAN_mc.bin
hainan_me.bin
HAINAN_me.bin
hainan_pfp.bin
HAINAN_pfp.bin
hainan_rlc.bin
HAINAN_rlc.bin
hainan_smc.bin
HAINAN_smc.bin
hawaii_ce.bin
HAWAII_ce.bin
hawaii_k_smc.bin
HAWAII_mc2.bin
hawaii_mc.bin
HAWAII_mc.bin
hawaii_me.bin
HAWAII_me.bin
hawaii_mec.bin
HAWAII_mec.bin
hawaii_pfp.bin
HAWAII_pfp.bin
hawaii_rlc.bin
HAWAII_rlc.bin
hawaii_sdma1.bin
hawaii_sdma.bin
HAWAII_sdma.bin
hawaii_smc.bin
HAWAII_smc.bin
hawaii_uvd.bin
hawaii_vce.bin
JUNIPER_me.bin
JUNIPER_pfp.bin
JUNIPER_rlc.bin
JUNIPER_smc.bin
kabini_ce.bin
KABINI_ce.bin
kabini_me.bin
KABINI_me.bin
kabini_mec.bin
KABINI_mec.bin
kabini_pfp.bin
KABINI_pfp.bin
kabini_rlc.bin
KABINI_rlc.bin
kabini_sdma1.bin
kabini_sdma.bin
KABINI_sdma.bin
kabini_uvd.bin
kabini_vce.bin
kaveri_ce.bin
KAVERI_ce.bin
kaveri_me.bin
KAVERI_me.bin
kaveri_mec2.bin
kaveri_mec.bin
KAVERI_mec.bin
kaveri_pfp.bin
KAVERI_pfp.bin
kaveri_rlc.bin
KAVERI_rlc.bin
kaveri_sdma1.bin
kaveri_sdma.bin
KAVERI_sdma.bin
kaveri_uvd.bin
kaveri_vce.bin
mullins_ce.bin
MULLINS_ce.bin
mullins_me.bin
MULLINS_me.bin
mullins_mec.bin
MULLINS_mec.bin
mullins_pfp.bin
MULLINS_pfp.bin
mullins_rlc.bin
MULLINS_rlc.bin
mullins_sdma1.bin
mullins_sdma.bin
MULLINS_sdma.bin
mullins_uvd.bin
mullins_vce.bin
oland_ce.bin
OLAND_ce.bin
oland_k_smc.bin
OLAND_mc2.bin
oland_mc.bin
OLAND_mc.bin
oland_me.bin
OLAND_me.bin
oland_pfp.bin
OLAND_pfp.bin
oland_rlc.bin
OLAND_rlc.bin
oland_smc.bin
OLAND_smc.bin
PALM_me.bin
PALM_pfp.bin
pitcairn_ce.bin
PITCAIRN_ce.bin
pitcairn_k_smc.bin
PITCAIRN_mc2.bin
pitcairn_mc.bin
PITCAIRN_mc.bin
pitcairn_me.bin
PITCAIRN_me.bin
pitcairn_pfp.bin
PITCAIRN_pfp.bin
pitcairn_rlc.bin
PITCAIRN_rlc.bin
pitcairn_smc.bin
PITCAIRN_smc.bin
R100_cp.bin
R200_cp.bin
R300_cp.bin
R420_cp.bin
R520_cp.bin
R600_me.bin
R600_pfp.bin
R600_rlc.bin
R600_uvd.bin
R700_rlc.bin
REDWOOD_me.bin
REDWOOD_pfp.bin
REDWOOD_rlc.bin
REDWOOD_smc.bin
RS600_cp.bin
RS690_cp.bin
RS780_me.bin
RS780_pfp.bin
RS780_uvd.bin
RV610_me.bin
RV610_pfp.bin
RV620_me.bin
RV620_pfp.bin
RV630_me.bin
RV630_pfp.bin
RV635_me.bin
RV635_pfp.bin
RV670_me.bin
RV670_pfp.bin
RV710_me.bin
RV710_pfp.bin
RV710_smc.bin
RV710_uvd.bin
RV730_me.bin
RV730_pfp.bin
RV730_smc.bin
RV740_smc.bin
RV770_me.bin
RV770_pfp.bin
RV770_smc.bin
RV770_uvd.bin
si58_mc.bin
SUMO2_me.bin
SUMO2_pfp.bin
SUMO_me.bin
SUMO_pfp.bin
SUMO_rlc.bin
SUMO_uvd.bin
tahiti_ce.bin
TAHITI_ce.bin
tahiti_k_smc.bin
TAHITI_mc2.bin
tahiti_mc.bin
TAHITI_mc.bin
tahiti_me.bin
TAHITI_me.bin
tahiti_pfp.bin
TAHITI_pfp.bin
tahiti_rlc.bin
TAHITI_rlc.bin
tahiti_smc.bin
TAHITI_smc.bin
TAHITI_uvd.bin
TAHITI_vce.bin
TURKS_mc.bin
TURKS_me.bin
TURKS_pfp.bin
TURKS_smc.bin
verde_ce.bin
VERDE_ce.bin
verde_k_smc.bin
VERDE_mc2.bin
verde_mc.bin
VERDE_mc.bin
verde_me.bin
VERDE_me.bin
verde_pfp.bin
VERDE_pfp.bin
verde_rlc.bin
VERDE_rlc.bin
verde_smc.bin
VERDE_smc.bin
Comment 10 Takashi Iwai 2019-05-14 16:06:51 UTC
So, the file is present but still dracut fails to install it?

Wait...  Have you ever install amdgpu package?  It's known to break dracut because of the buggy firmware path setup.

If you have it, uninstall it and make sure that everything got cleaned up without stale files left from the package, and try to recreate initrd.
Comment 11 James Roulston 2019-05-14 16:30:28 UTC
A triad to install AMDGPU-Pro a while back but it didn't work so I got rid of it and it's repos so maybe it broke something. According to YaST amdgpu is installed so I'll delete those and see what happens.  If it's still broken I can do a fresh reinstall to see if things work.
Comment 12 Stefan Dirsch 2019-05-14 16:50:32 UTC
IIRC amdgpu proprietary driver installed a file below /etc/dracut.conf.d in order to change the firmware file path (driver comes with his own firmware files). The big issue was, that it did'nt uninstall this config file during uninstall. So please make sure there is no such file left afterwards. Otherwise there will be no firmware files in initrd after running mkinitrd.

BTW, there is a command to list initrd content.

sudo lsinitrd /boot/initrd..

so you can check whether the firmware files are generated to initrd.
Comment 13 James Roulston 2019-05-14 17:53:13 UTC
I just reinstalled and everything seems to be working now.
The amdgpu driver probably broke things.
The result of /etc/dracut.conf.d is:

-rw-r--r-- 1 root root  22 May  3 21:30 02-early-microcode.conf
-rw-r--r-- 1 root root 487 May  3 21:30 99-debug.conf
-rw-r--r-- 1 root root 821 May  4 02:27 ostree.conf

Thanks for your help and time and if it happens again I know what to look out for.
Comment 14 Stefan Dirsch 2019-05-15 09:42:13 UTC
Ok. Let's assume the culprit was AMD's amdgpu proprietary driver packages. According to AMD the issue has been fixed in a later release of amdgpu driver packages though.
Comment 15 James Roulston 2019-05-15 12:57:14 UTC
I reinstalled the AMDGPU-Pro driver again to see if I could recreate the issue and it did.  I got a blank screen on boot so booted with nomodeset and ran yast via terminal and manually removed the AMDGPU-Pro packages and ran mkinitrd, I got the same firmware missing messages and the system wouldn't boot into X afterwords like before so I checked the /etc/dracut.conf.d and there was a file called amdgpu-5.0.13-1-default.conf, so I removed it and ran mkinitrd and received no missing firmware messages. I then rebooted normally and everything is running fine again.

So the AMD Driver was the culprit here.

Thank you all for your time and help.

Regards,
James
Comment 16 Stefan Dirsch 2019-05-15 14:34:21 UTC
Hell, AMD claimed they would have fixed the issue with the latest available drivers and this is months ago ...

Which driver version are you using? I believe the latest version is 18.50

https://www.amd.com/en/support/kb/release-notes/rn-rad-lin-18-50-unified
Comment 17 James Roulston 2019-05-15 14:52:03 UTC
The amdgpu-pro version is 19.10. which I downloaded from the same link you provided. 
It's not compatible with my card so I'm back using radeon which works well.
Comment 18 Stefan Dirsch 2019-05-15 15:47:32 UTC
(In reply to James Roulston from comment #17)
> The amdgpu-pro version is 19.10. which I downloaded from the same link you
> provided. 
> It's not compatible with my card so I'm back using radeon which works well.

I have downloaded the SLED/SLES 15 RPM packages from

https://www.amd.com/en/support/kb/release-notes/rn-rad-lin-19-10-unified

In none of the packages I could find a dracut file. It's also not created by
a %pre/%post RPM script. I have no idea how this file

/etc/dracut.conf.d/amdgpu-5.0.13-1-default.conf

is created.
Comment 19 James Roulston 2019-05-15 19:02:22 UTC
I reinstalled amdgpu-pro again to check Yast software manager for files that are provided by the driver but I did not see anything.
This also meant it broke my machine again, but when removing the amdgpu file from drac.conf.d and restarting, it still wouldn't start X. but when I checked the drac.conf.d directory again I got the following output:

total 16
-rw-r--r-- 1 root root  22 May  3 21:30 02-early-microcode.conf
-rw-r--r-- 1 root root 487 May  3 21:30 99-debug.conf
-rw-r--r-- 1 root root  87 May 15 18:20 amdgpu-5.0.13-1-default.conf
-rw-r--r-- 1 root root 821 May  4 02:27 ostree.conf

I deleted it and tried loading the amdgpu driver via modprobe, it didn't start but the amdgpu-5.0.13-1-default.conf file was back in again.

I deleted the file again and rebooted and it appeared again.
Maybe the driver is creating the file itself when it tries to load.

I have deleted all amdgpu-pro related drivers and files again from the AMD repo.

I checked the list of compatible cards for the amdgpu driver and mines is not supported, which is why it never worked.

It was the amdgpu-5.0.13-1-default.conf that was causing the issue.
Comment 20 Stefan Dirsch 2019-05-16 10:08:34 UTC
/usr/src/amdgpu-19.10-785424/pre-build.sh
[...]
FW_DIR="/lib/firmware/$KERNELVER"
mkdir -p $FW_DIR
cp -ar /usr/src/amdgpu-19.10-785424/firmware/amdgpu $FW_DIR
echo "add_drivers+=\" amdgpu\"" >/etc/dracut.conf.d/amdgpu-$KERNELVER.conf
echo "add_drivers+=\" amdkfd\"" >>/etc/dracut.conf.d/amdgpu-$KERNELVER.conf
echo "fw_dir+=\"$FW_DIR\"" >>/etc/dracut.conf.d/amdgpu-$KERNELVER.conf

/usr/src/amdgpu-19.10-785424/post-remove.sh
#!/bin/bash

FW_DIR="/lib/firmware"
rm -rf $FW_DIR/*/amdgpu
[[ ! $(ls -A $FW_DIR) ]] && rm -rf $FW_DIR
rm -f /etc/dracut.conf.d/amdgpu-*.conf

These files belong to package amdgpu-dkms.

rpm --scripts -qp amdgpu-dkms-19.10-785424.noarch.rpm
[...]
preuninstall scriptlet (using /bin/sh):
dkms remove -m amdgpu -v 19.10-785424 --all --rpm_safe_upgrade
exit $?

I guess that this dkms should have removed the dracut file, which for some reason failed.
Comment 21 Stefan Dirsch 2019-05-16 10:09:45 UTC
s/dkms/dkms call/
Comment 22 Stefan Dirsch 2019-05-20 13:00:08 UTC
Fabian, could you try to reproduce the issue by installing the SLE15 packages from

https://www.amd.com/en/support/kb/release-notes/rn-rad-lin-19-10-unified

Then - if reproducable - try to figure out, why this happens. One explanation would be, that dkms is no longer available when amdgpu-dkms is  being uninstalled.
Comment 23 Fabian Baumanis 2019-05-21 12:01:49 UTC
I adjusted the pre-uninstall script from the amdgpu-dkms package.
Now, after the 'dkms remove' call, the amdgpu-*.conf file is removed independently from the dkms call.

See the attached patch.
Comment 24 Fabian Baumanis 2019-05-21 12:02:27 UTC
Created attachment 805594 [details]
Patch for the amdgpu-dkms specfile.
Comment 25 Takashi Iwai 2019-05-21 12:31:59 UTC
Strictly speaking, the postun scriptlet should check the argument $1 to see if it's an update or the actual uninstall.
Comment 26 Stefan Dirsch 2019-05-21 12:56:07 UTC
(In reply to Takashi Iwai from comment #25)
> Strictly speaking, the postun scriptlet should check the argument $1 to see
> if it's an update or the actual uninstall.

Right, we would need something like this:

 %preun -p /bin/sh
# not on update!
if [ "$1" -eq 0 ]; then
  # dkms call may fail, so the script, which removes the dracut file, will not be executed
  # so make sure that it gets removed in any case
  rm -f /etc/dracut.conf.d/amdgpu-*.conf
  dkms remove -m amdgpu -v 19.10-785424 --all --rpm_safe_upgrade
fi
Comment 27 Stefan Dirsch 2019-08-29 12:30:14 UTC
*** Bug 1147646 has been marked as a duplicate of this bug. ***
Comment 28 Anatoli Antonovitch 2019-10-07 17:56:27 UTC
The postun scriptlet has been updated in the spec. It should be available in the release 19.40.
Comment 29 Stefan Dirsch 2019-10-07 18:25:44 UTC
Thanks. Seems 19.40 is not available yet. At least I couldn't find it ...