Bug 1160204

Summary: Failure to boot with latest version of ucode-amd
Product: [openSUSE] openSUSE Tumbleweed Reporter: Will Bainbridge <microfocus>
Component: OtherAssignee: Thomas Renninger <trenn>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: alynx.zhou, bpetkov, michael.muschner, microfocus, tiwai
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Factory   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: /proc/cpuinfo as requsted by Thomas Renninger
Output of cpuid as requested by Thomas Renninger

Description Will Bainbridge 2020-01-07 09:03:09 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0
Build Identifier: 

If the current Tumbleweed version of ucode-amd is installed (20191118-1.1) then the machine resets just after GRUB as the kernel is being loaded. If ucode-amd is downgraded to the version in Leap 15.1 (20190618-lp151.2.6.1) the machine boots successfully.

Reproducible: Always

Steps to Reproduce:
1. Create a live USB image of Tumbleweed (tested 20191228)
2. Boot from this USB stick
3. Hit ENTER on the default selection in GRUB, or wait for it to time-out
Actual Results:  
Grub prints "Loading initial ramdisk", then the machine immediately resets.

Expected Results:  
Grub prints "Loading initial ramdisk", then the boot sequence proceeds.

The machine has dual AMD EPYC 7282 16-core processors and a Supermicro H11DSi-NT motherboard.

The failure seems to occur before the kernel is loaded, so loglevel parameters do not result in any useful data being printed before the reset. 

The problem can be reproduced either with a live USB stick or with a fresh install of Tumbleweed with default settings. Deselecting "ucode-amd" in the installer resolves the issue. Installing "ucode-amd" from the Leap 15.1 update repository also results in a functional system. It is just the version of "ucode-amd" in Tumbleweed (and Leap 15.2) that appears to be the issue.
Comment 1 Thomas Renninger 2020-01-08 10:10:30 UTC
Can you please attach /proc/cpuinfo and output of cpuid command.
Comment 2 Borislav Petkov 2020-01-08 11:06:25 UTC
See if the newest release fixes your issue:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amd-ucode/microcode_amd_fam17h.bin

You simply need to copy it over

/lib/firmware/amd-ucode/microcode_amd_fam17h.bin

regenerate initrd and reboot.

Thx.
Comment 3 Will Bainbridge 2020-01-08 12:46:56 UTC
Created attachment 827150 [details]
/proc/cpuinfo as requsted by  Thomas Renninger
Comment 4 Will Bainbridge 2020-01-08 12:47:26 UTC
Created attachment 827151 [details]
Output of cpuid as requested by Thomas Renninger
Comment 5 Will Bainbridge 2020-01-09 10:34:05 UTC
(In reply to Borislav Petkov from comment #2)
> See if the newest release fixes your issue:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> tree/amd-ucode/microcode_amd_fam17h.bin

I can confirm that copying this file to /lib/firmware/amd-ucode/microcode_amd_fam17h.bin, running mkinitrd, and rebooting resolves the issue.

Many thanks. I guess I now just need to wait until this update becomes part of the main Tumbleweed release.
Comment 6 Borislav Petkov 2020-01-09 11:36:46 UTC
Yes, thanks for testing.
Comment 7 Thomas Renninger 2020-01-10 11:59:12 UTC
AMD firmware is part of kernel-firmware which is maintained by Takashi.

As this looks like "should be fixed asap", I set needinfo flag instead of CC'ing.

@tiwai: It would be great if this could be fixed up by a kind of general kernel-firmware update. Let me know if I should/have to pick up the AMD firmware and add it manually.
Comment 8 Thomas Renninger 2020-01-10 12:00:10 UTC
Hmm, we might want to make sure the buggy firmware does not end up in SLE 15 SP2 Beta2...
Comment 9 Takashi Iwai 2020-01-10 13:03:52 UTC
So we need the update of the commit
c4586ffaac0ca0d7045e06140b6426f2e79e96e6
Author: John Allen <john.allen@amd.com>
Date:   Wed Dec 18 08:27:40 2019 -0600

    linux-firmware: Update AMD cpu microcode
    
??

The update for TW kernel-firmware package is already in its way containing the very latest one (20200107).  But SLE15-SP1:Update contains 20191118, so it seems that we need the update again.  Will trigger it later.
Comment 11 Thomas Renninger 2020-01-15 10:42:02 UTC
I close this fixed.
-> Bug is against Tumbleweed

@Takashi: In case you still need a reference bug open for whatever SLE/Leap submission, just re-open or adjust.

afaik all necessary submissions have been done by Takashi already, also for SLE.

Thanks for taking care, it's very much appreciated!
Comment 13 Swamp Workflow Management 2020-02-07 14:16:46 UTC
SUSE-RU-2020:0361-1: An update that has two recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1143331,1160204
CVE References: 
Sources used:
SUSE Linux Enterprise Module for Basesystem 15-SP1 (src):    kernel-firmware-20200107-3.12.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 14 Swamp Workflow Management 2020-02-12 20:12:13 UTC
openSUSE-RU-2020:0212-1: An update that has two recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1143331,1160204
CVE References: 
Sources used:
openSUSE Leap 15.1 (src):    kernel-firmware-20200107-lp151.2.12.1