|
Bugzilla – Full Text Bug Listing |
| Summary: | Rebooting KVM virtual machine gives a black screen | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Neil Rickert <nwr10cst-oslnx> |
| Component: | KVM | Assignee: | Joey Lee <jlee> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Minor | ||
| Priority: | P2 - High | CC: | dfaggioli, jcheung, jlee, jose.ziviani, mchang, nwr10cst-oslnx, predivan |
| Version: | Leap 15.3 | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | Other | ||
| See Also: | https://bugzilla.suse.com/show_bug.cgi?id=1192126 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Output from "virsh dumpxml ubuntu19"
OVMF debug log VM domain XML libvirt domain configuration OVMF debug log('systemctl reboot') Attaching "dmesg.log" as requested. ovmf-bsc1192126-OvmfPkg-PlatformPei-Always-reserve-the-SEV-ES-work-a.patch |
||
|
Description
Neil Rickert
2021-06-11 20:49:57 UTC
A couple of additional comments: (1) I am pretty sure that this is not a grub problem. It looks like an ovmf firmware problem. On a normal reboot, I should see a prompt to hit ESC if I want the boot options menu. But when I see this issue, it does not even get that far. (2) There was an "ovmf" update a few weeks ago. After that update, the problem went away on virtual machines that are using "/usr/share/qemu/ovmf-x86_64-smm-ms-code.bin". But the problem does still show up on a virtual machine using "/usr/share/qemu/ovmf-x86_64-ms-4m-code.bin" (but not on every reboot). (In reply to Neil Rickert from comment #2) > A couple of additional comments: > > (1) I am pretty sure that this is not a grub problem. It looks like an ovmf > firmware problem. On a normal reboot, I should see a prompt to hit ESC if I > want the boot options menu. But when I see this issue, it does not even get > that far. > > (2) There was an "ovmf" update a few weeks ago. After that update, the > problem went away on virtual machines that are using > "/usr/share/qemu/ovmf-x86_64-smm-ms-code.bin". But the problem does still > show up on a virtual machine using > "/usr/share/qemu/ovmf-x86_64-ms-4m-code.bin" (but not on every reboot). Hi Neil, Are you using libvirt ? If so could you please attach the "virsh dumpxml ..." output from you guest ? You can also grab the log from the ovmf via. <qemu:commandline> <qemu:arg value='-chardev'/> <qemu:arg value='file,id=ovmf,path=/tmp/myvm-ovmf-debug.log'/> <qemu:arg value='-device'/> <qemu:arg value='isa-debugcon,iobase=0x402,chardev=ovmf'/> </qemu:commandline> Please refer to https://libvirt.org/kbase/qemu-passthrough-security.html for setting up the qemu command-line passthrough for libvirt If you are using qemu, just attach your qemu command here and try the command line above to grab the ovmf log when the problem occurs ... Thanks. Created attachment 852631 [details]
Output from "virsh dumpxml ubuntu19"
I am attaching the dumpxxml output.
I'm not sure how to get those ovmf logs. I configured virt-manager to allow editing xml. Then I copied the lines you suggested to just above "</domain>" and saved those changes. But the resulting xml file still did not have those changes. Maybe I am doing something wrong. Or perhaps I have to directly edit the file in "/etc/libvirt/qemu".
I've seen the same thing for a while now, and across different versions of OVMF package(I am using Virtualization repo for qemu/libvirt), currently 202108-195.1 Two VM's I recently saw behaving like that both use ovmf-x86_64-ms-4m-code.bin, and I can't test with smm ones, host CPU is way too old :) Attached are xml for 15.3 VM domain and OVMF debug log from when I upgraded the VM to 15.4 alpha and the freeze happened. 'virsh destroy $domain --graceful' is how I dealt with that so far, with, AFAICT, no side-effects to the VM running afterwards. Created attachment 854502 [details]
OVMF debug log
Created attachment 854503 [details]
VM domain XML
Ok, it still seems a potential OVMF issue to me, so I'm trying to assign to Joey, to see what he thinks about it. That said, Michael, I think we now have the OVMF logs you wanted to see (albeit, from a different user)... Is that the case? (In reply to Dario Faggioli from comment #9) > Ok, it still seems a potential OVMF issue to me, so I'm trying to assign to > Joey, to see what he thinks about it. > > That said, Michael, I think we now have the OVMF logs you wanted to see > (albeit, from a different user)... Is that the case? The OVMF was trapped in this loop, likely being reset over and over again as the SecCoreStartupWithStack() is the entry point for C codes. This is certainly way over my head. Yes Joey would have better idea on the OVMF internals. > SecCoreStartupWithStack(0xFFFCC000, 0x820000) > Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE > Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3 > Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A > The 0th FV start address is 0x00000820000, size is 0x000E0000, handle is 0x820000 > Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39 > Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38 > Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6 > Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389 > DiscoverPeimsAndOrderWithApriori(): Found 0xB PEI FFS files in the 0th FV > Loading PEIM 9B3ADA4F-AE56-4C24-8DEA-F03B7558AE50 > Loading PEIM at 0x0000082C140 EntryPoint=0x0000082F58A PcdPeim.efi > Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480 > Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1 > Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A > Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81 > Register PPI Notify: 605EA650-C65C-42E1-BA80-91A52AB618C6 > Loading PEIM A3610442-E69F-4DF3-82CA-2360C4031A23 > Loading PEIM at 0x000008313C0 EntryPoint=0x00000832814 ReportStatusCodeRouterPei.efi > Install PPI: 0065D394-9951-4144-82A3-0AFC8579C251 > Install PPI: 229832D3-7A30-4B36-B827-F40CB7D45436 > Loading PEIM 9D225237-FA01-464C-A949-BAABC02D31D0 > Loading PEIM at 0x00000833440 EntryPoint=0x00000834704 StatusCodeHandlerPei.efi > Loading PEIM 222C386D-5ABC-4FB4-B124-FBB82488ACF4 > Loading PEIM at 0x00000835440 EntryPoint=0x0000083AAE0 PlatformPei.efi > Select Item: 0x0 > FW CFG Signature: 0x554D4551 > Select Item: 0x1 > FW CFG Revision: 0x3 (In reply to Neil Rickert from comment #5) > Created attachment 852631 [details] > Output from "virsh dumpxml ubuntu19" > > I am attaching the dumpxxml output. > > I'm not sure how to get those ovmf logs. I configured virt-manager to allow > editing xml. Then I copied the lines you suggested to just above > "</domain>" and saved those changes. But the resulting xml file still did > not have those changes. Maybe I am doing something wrong. Or perhaps I > have to directly edit the file in "/etc/libvirt/qemu". Sorry somehow this fell through the cracks. :( You could try `virsh edit ..` to change domain xml if virt-manager somehow have your changes discard. Please also remember to add custom namespace or the added qemu elements would be rejected. <domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm"> I am confused. There seems to be no file associated to your cdrom device in VM domain XML (attachment 854503 [details]) <disk type="file" device="cdrom"> <driver name="qemu" type="raw"/> <target dev="sda" bus="sata"/> <readonly/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> But OVMF debug log (attachment 854502 [details]) has shown that cdrom was booting. Could you please help to check why they didn't match ? Btw, I tried several times to reproduced on my leap15.3 with ovmf-x86_64-ms-4m-code.bin fully updated. This is also my default vm setup I am running daily, but still I can't reproduce or see the problem ever ... [Bds]=============Begin Load Options Dumping ...============= Driver Options: SysPrep Options: Boot Options: Boot0004: UEFI QEMU DVD-ROM QM00001 0x0001 Boot0001: opensuse-secureboot 0x0001 Boot0002: UEFI Misc Device 0x0001 Boot0000: UiApp 0x0109 Boot0003: EFI Internal Shell 0x0001 PlatformRecovery Options: PlatformRecovery0000: Default PlatformRecovery 0x0001 [Bds]=============End Load Options Dumping============= [Bds]Booting UEFI QEMU DVD-ROM QM00001 BlockSize : 2048 LastBlock : 1ED1FF PartitionDxe: El Torito standard found on handle 0x7E5B0C18. BlockSize : 2048 LastBlock : 3 FatDiskIo: Cache Page OutBound occurred! FSOpen: Open '\EFI\BOOT\BOOTX64.EFI' Success Created attachment 854681 [details]
libvirt domain configuration
(In reply to Michael Chang from comment #12) > I am confused. There seems to be no file associated to your cdrom device in > VM domain XML (attachment 854503 [details]) > > <disk type="file" device="cdrom"> > <driver name="qemu" type="raw"/> > <target dev="sda" bus="sata"/> > <readonly/> > <address type="drive" controller="0" bus="0" target="0" unit="0"/> > </disk> > > But OVMF debug log (attachment 854502 [details]) has shown that cdrom was > booting. Could you please help to check why they didn't match ? Apologies for that, I attached the proper file now(hopefully). > Btw, I tried several times to reproduced on my leap15.3 with > ovmf-x86_64-ms-4m-code.bin fully updated. This is also my default vm setup I > am running daily, but still I can't reproduce or see the problem ever ... I haven't figured out how to reliably reproduce it yet, but, based on past experience, and the fact I've been seeing it occasionally since 15.3 beta-ish, I *think* that it is triggered either 1. by system update that requires the initrd rebuild, or 2.distribution upgrade (which I was doing in this case, 15.3 with XFCE to 15.4-alpha). No hard evidence to back that up, I am afraid. Created attachment 854709 [details]
OVMF debug log('systemctl reboot')
> I haven't figured out how to reliably reproduce it yet,
> but, based on past experience, and the fact I've been seeing it occasionally
> since 15.3 beta-ish, I *think* that it is triggered either
> 1. by system update that requires the initrd rebuild, or
> 2.distribution upgrade (which I was doing in this case, 15.3 with XFCE to
> 15.4-alpha).
3.Heavy-ish VM disk use?
VM disk was getting low on free space, so I removed some packages and old snapshots, ~4Gb worth, then ran 'btrfs scrub', which completed without errors.
When I tried 'systemctl reboot', the screen went blank, with the cursor indicating
some activity.
OVMF debug log attached.
(In reply to Neil Rickert from comment #0) > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 > Firefox/78.0 > Build Identifier: > > I am using several virtual machines. The host system is running Leap 15.3. > The virtual machines run a variety (15.3, Tumbleweed, 15.2, Solus). > > On rebooting the VM, I often finish up with a black screen. Based on what I > see on the screen, it looks is if the VM is rebooting correctly, but is > failing to connect to the firmware on reboot. It seems to just loop. > I just fought with the bsc#1193315 and the symptom is unlimited reboot. Could you (or anyone) try this OVMF in my home branch? https://build.opensuse.org/package/show/home:joeyli:branches:SUSE:SLE-15-SP3:Update/ovmf My workaround patch can fix the bsc#1193315. Maybe it also works here. (In reply to Joey Lee from comment #17) > (In reply to Neil Rickert from comment #0) > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 > > Firefox/78.0 > > Build Identifier: > > > > I am using several virtual machines. The host system is running Leap 15.3. > > The virtual machines run a variety (15.3, Tumbleweed, 15.2, Solus). > > > > On rebooting the VM, I often finish up with a black screen. Based on what I > > see on the screen, it looks is if the VM is rebooting correctly, but is > > failing to connect to the firmware on reboot. It seems to just loop. > > > > I just fought with the bsc#1193315 and the symptom is unlimited reboot. ^^^^^^^^^^^^ bsc#1192126, sorry for my typo! > > Could you (or anyone) try this OVMF in my home branch? > > https://build.opensuse.org/package/show/home:joeyli:branches:SUSE:SLE-15-SP3: > Update/ovmf > > My workaround patch can fix the bsc#1193315. Maybe it also works here. Responding to Joey Lee c#18 I downloaded ovmf-202008-10.13.1.x86_64.rpm and installed that. But the same misbehavior occurs. I'm not sure how all of this works. I would have thought that "qemu-ovmf-x86_64" needed to be updated, but I did not see an update for that. (In reply to Neil Rickert from comment #19) > Responding to Joey Lee c#18 > > I downloaded ovmf-202008-10.13.1.x86_64.rpm and installed that. But the > same misbehavior occurs. The ovmf-202008 package only includes some efi tools, not ovmf binary. > > I'm not sure how all of this works. I would have thought that > "qemu-ovmf-x86_64" needed to be updated, but I did not see an update for > that. Please download qemu-ovmf-x86_64-202008-10.13.1.noarch.rpm from here: https://build.opensuse.org/package/binaries/home:joeyli:branches:SUSE:SLE-15-SP3:Update/ovmf/pool-leap-15.3 Then install and test. Or you want to add my branch to zypper repo list: https://download.opensuse.org/repositories/home:/joeyli:/branches:/SUSE:/SLE-15-SP3:/Update/pool-leap-15.3/ Thanks! Responding to c#20 I added your repo, and updated "qemu-ovmf-x86_64" to 202008-10.13.1 But still the same misbehavior. From what I see, it does look like OVMF crashing/resetting in a loop, as in bug 1187245 . Perhaps I need to rebuild that virtual machine: (1) delete the VM but retain the disk image (2) create new VM importing the disk image (3) configure that VM to use ovmf-x86_64-ms-4m-code.bin (the one currently being used). (In reply to Neil Rickert from comment #21) > Responding to c#20 > > I added your repo, and updated "qemu-ovmf-x86_64" to 202008-10.13.1 > > But still the same misbehavior. > > From what I see, it does look like OVMF crashing/resetting in a loop, as in > bug 1187245 . Perhaps I need to rebuild that virtual machine: > > (1) delete the VM but retain the disk image > (2) create new VM importing the disk image > (3) configure that VM to use ovmf-x86_64-ms-4m-code.bin (the one currently > being used). You does not need to rebuild VM image. Could you please attach guest's dmesg log on bugzilla after booting with new OVMF? Please add the following kernel parameter in /boot/grub2/grub.cfg: efi=debug Then boot to console and run "dmesg > dmesg.log" then attach dmesg.log. If you run with the _right_ OVMF image, then we should see the following region in EFI memory map: [ 0.000000] efi: mem06: [ACPI Mem NVS| | | | | | | | | | |WB|WT|WC|UC] range=[0x000000000080b000-0x000000000080bfff] (0MB) The 0x80b000 is the start address of PcdSevEsWorkArea. (In reply to Joey Lee from comment #22) > (In reply to Neil Rickert from comment #21) > > Responding to c#20 > > > > I added your repo, and updated "qemu-ovmf-x86_64" to 202008-10.13.1 > > > > But still the same misbehavior. > > > > From what I see, it does look like OVMF crashing/resetting in a loop, as in > > bug 1187245 . Perhaps I need to rebuild that virtual machine: > > > > (1) delete the VM but retain the disk image > > (2) create new VM importing the disk image > > (3) configure that VM to use ovmf-x86_64-ms-4m-code.bin (the one currently > > being used). > > You does not need to rebuild VM image. > > Could you please attach guest's dmesg log on bugzilla after booting with new > OVMF? Please add the following kernel parameter in /boot/grub2/grub.cfg: > > efi=debug > > Then boot to console and run "dmesg > dmesg.log" then attach dmesg.log. If > you run with the _right_ OVMF image, then we should see the following region > in EFI memory map: > > [ 0.000000] efi: mem06: [ACPI Mem NVS| | | | | | | | | | > |WB|WT|WC|UC] range=[0x000000000080b000-0x000000000080bfff] (0MB) > > The 0x80b000 is the start address of PcdSevEsWorkArea. On the other hand, please add "-d cpu_reset" qemu parameter when reproducing issue. It can print the cpu register when system reboot. Created attachment 854744 [details]
Attaching "dmesg.log" as requested.
I am not seeing the line that you expected in that "dmesg" output.
I tried rebuilding the VM (I first cloned, then rebuilt the clone). It did not help.
Perhaps the "ovmf" image that I am using was not patched.
I should perhaps mention that when I first reported this, I was seeing the issue with both "ovmf-x86_64-ms-4m-code.bin" and "ovmf-x86_64-smm-ms-code.bin". Then at some time there was an "ovmf" update, and after that update I only saw the issue with "ovmf-x86_64-ms-4m-code.bin".
(In reply to Neil Rickert from comment #24) > Created attachment 854744 [details] > Attaching "dmesg.log" as requested. > > I am not seeing the line that you expected in that "dmesg" output. > > I tried rebuilding the VM (I first cloned, then rebuilt the clone). It did > not help. > > Perhaps the "ovmf" image that I am using was not patched. > > I should perhaps mention that when I first reported this, I was seeing the > issue with both "ovmf-x86_64-ms-4m-code.bin" and > "ovmf-x86_64-smm-ms-code.bin". Then at some time there was an "ovmf" > update, and after that update I only saw the issue with > "ovmf-x86_64-ms-4m-code.bin". Thanks! The demsg log is useful. I know what's the problem now. When S3 be disabled in VM. The PcdSevEsWorkArea will be reserved as a EfiBootServicesData region. This region be marked as usable region after booting to OS. So this region can still be written by kernel. When system reboot, kernel writes data to the region then the unlimited system reset be triggered. I am producing a new workaround patch against this problem. Created attachment 854757 [details]
ovmf-bsc1192126-OvmfPkg-PlatformPei-Always-reserve-the-SEV-ES-work-a.patch
Updated workaround patch. Always reserved the SEV-ES work area as a ACPI NVS region.
Hi Neil, (In reply to Joey Lee from comment #26) > Created attachment 854757 [details] > ovmf-bsc1192126-OvmfPkg-PlatformPei-Always-reserve-the-SEV-ES-work-a.patch > > Updated workaround patch. Always reserved the SEV-ES work area as a ACPI NVS > region. I just built new workaround patch with ovmf in my home branch https://build.opensuse.org/package/binaries/home:joeyli:branches:SUSE:SLE-15-SP3:Update/ovmf/pool-leap-15.3 Could you please help to test the qemu-ovmf-x86_64-202008-10.14.1.noarch.rpm again? I updated both "ovmf" and "qemu-ovmf-x86_64" to 202008-10.14.1 using your repo. It now seems to behave the way it should (rebooting normally). Thanks. (In reply to Neil Rickert from comment #28) > I updated both "ovmf" and "qemu-ovmf-x86_64" to 202008-10.14.1 using your > repo. > > It now seems to behave the way it should (rebooting normally). > > Thanks. Thanks for your testing! The workaround patch will be pushed to SLE15-SP3 in IBS then duplicate to Leap 15.3 (In reply to Joey Lee from comment #29) > (In reply to Neil Rickert from comment #28) > > I updated both "ovmf" and "qemu-ovmf-x86_64" to 202008-10.14.1 using your > > repo. > > > > It now seems to behave the way it should (rebooting normally). > > > > Thanks. > > Thanks for your testing! > > The workaround patch will be pushed to SLE15-SP3 in IBS then duplicate to > Leap 15.3 The patch be merged to SLE15-SP3/ovmf. Waiting the change be duplicated to Leap 15.3 in OBS. The OVMF update for 15.3 showed up today. And everything seems to be working as it should. I'll close this as fixed. Thanks. |