Bug 927719

Summary: Tumbleweed Snapshot blocked: no multipath support in 20150416 (likely dracut issue)
Product: [openSUSE] openSUSE Tumbleweed Reporter: Dominique Leuenberger <dimstar>
Component: BasesystemAssignee: Thomas Renninger <trenn>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: crrodriguez, dimstar, fcrozat, forgotten_lNYeazqpWh, hare, mlin, mpluskal, rmilasan, trenn
Version: 201503*   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: 90multipath-etc-multipath.conf-is-not-required.patch

Description Dominique Leuenberger 2015-04-18 16:00:23 UTC
Since Snapshot 20150416, the two multipath tests (32bit & 64bit) no longer pass...

See reference tests
https://openqa.opensuse.org/tests/57243
https://openqa.opensuse.org/tests/57244

A very likely candidate for the breakage is dracut, that was checked in at that time (after 0415)
Comment 1 Dominique Leuenberger 2015-05-06 07:15:39 UTC
dracut had been reverted in openSUSE:Factory in order to get snapshots out.

Since then, dracut has been hanging in a staging area, awaiting a fix.

https://build.opensuse.org/project/staging_projects/openSUSE:Factory/F gives an overview of the current state of things.
Comment 2 Thomas Renninger 2015-05-07 11:36:07 UTC
Dominique: Thanks for the bug.
It took some time to set up a multipath system, etc to find it.
The culprit is found and it should be easy to fix.
There is another bug which I mark as duplicate to this one.
Comment 3 Thomas Renninger 2015-05-07 14:07:07 UTC
*** Bug 930019 has been marked as a duplicate of this bug. ***
Comment 4 Hannes Reinecke 2015-05-07 14:29:36 UTC
Created attachment 633614 [details]
90multipath-etc-multipath.conf-is-not-required.patch

90multipath: /etc/multipath.conf is not required.
Comment 5 Hannes Reinecke 2015-05-07 14:30:12 UTC
Fixed with the above patch.
Thomas, can you update the dracut rpm?
Comment 6 Thomas Renninger 2015-05-08 09:02:56 UTC
Done: https://build.opensuse.org/request/show/305875
Comment 7 Thomas Renninger 2015-05-20 15:41:15 UTC
Fixed and already in factory.
Comment 8 Dominique Leuenberger 2015-05-20 15:59:17 UTC
I disagree to the statement of fixed - dracut has been accepted to overcome other issues, but the multipath issue is not fixed.

See latest test run for example:
https://openqa.opensuse.org/tests/63432
Comment 9 Thomas Renninger 2015-05-22 11:30:40 UTC
Ok, I re-installed on a virtual machine, similar like the test platform and it works for me with latest factory dist (including dracut version 041):

multipath -l
35001405134ccdf0a dm-0 QEMU,QEMU HARDDISK
size=10G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 0:0:0:1 sdb 8:16 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 0:0:0:0 sda 8:0  active undef running

rpm -qi dracut
Name        : dracut
Version     : 041

cat /etc/os-release 
NAME=openSUSE
VERSION="20150519 (Tumbleweed)"
...

rpm -qi multipath-tools
Name        : multipath-tools
Version     : 0.5.0
...

I had a quick look at the video from comment #8.
Unfortunately there are not many logs in there.
Everything looks rather similar than what I did.
Any ideas how to reproduce this or how to gain more logs from the failing system. Dracut should fall back into a rescue shell after some more ten seconds of timeout. Is it posssible to access such virtual system somehow?
Comment 10 Dominique Leuenberger 2015-05-22 18:06:58 UTC
(In reply to Thomas Renninger from comment #9)
 
> I had a quick look at the video from comment #8.
> Unfortunately there are not many logs in there.
> Everything looks rather similar than what I did.
> Any ideas how to reproduce this or how to gain more logs from the failing
> system. Dracut should fall back into a rescue shell after some more ten
> seconds of timeout. Is it posssible to access such virtual system somehow?

Should be possible to get you an access to a openqa worker in interactive mode. I Will be back 'on duty' coming Wednesday (May 27) - simply ping me in #opensuse-factory and we'll have that setup in no time.
Comment 11 Thomas Renninger 2015-06-09 14:19:13 UTC
Ok, I finally could reproduce (I guess/hope that is what happens in openqa):
An internal KVM error happens when the system tries to boot from the multipath setup. All you see in vncviewer is:

Booting from local disk...

and I expect the kernel did not get executed yet, but kvm failed on getting it from the multipath device (console= specified on kernel command line, but nothing is shown in serial console output):

/usr/bin/eatmydata /usr/bin/qemu-kvm -chardev socket,id=serial0,path=console.sock,server,nowait -serial chardev:serial0 -m 1024 -cpu qemu64 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -device virtio-scsi-pci,id=scsi0 -device virtio-scsi-pci,id=scsi1 -drive file=bighd.qcow2,cache=unsafe,if=none,id=hd1a,serial=mpath1 -device scsi-hd,drive=hd1a,bus=scsi1.0 -drive file=bighd.qcow2,cache=unsafe,if=none,id=hd1b,serial=mpath1 -device scsi-hd,drive=hd1b,bus=scsi0.0 -cdrom /var/lib/openqa/share/factory/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20150606-Media.iso -boot once=d,menu=on,splash-time=5000 -device usb-ehci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :98,share=force-shared -qmp unix:qmp_socket,server,nowait -monitor unix:hmp_socket,server,nowait -S -monitor telnet:127.0.0.1:20088,server,nowait
KVM internal error. Suberror: 2
extra data[0]: 80000b0d
extra data[1]: 80000b0e
EAX=00000000 EBX=00009eba ECX=00000000 EDX=00000000
ESI=0000000d EDI=00000000 EBP=00000000 ESP=000bfff4
EIP=0000a877 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0020 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 00000000 ffffffff 00c00000
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0000 00000000 ffffffff 00c00000
TR =0008 00000580 00000067 00008b00 DPL=0 TSS32-busy
GDT=     0000abb0 0000002f
IDT=     000030b8 000007ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=d8 8e d0 b0 08 0f 00 d8 8b 25 f4 af 00 00 89 e8 ff e3 fa fc <89> 25 f4 af 00 00 ea 85 9e 00 00 10 00 60 0f b6 74 24 20 bb ba 9e 00 00 eb e4 61 83 c4 04
Comment 12 Thomas Renninger 2015-06-09 14:23:49 UTC
Summary:
- There was a dracut multipath issue that has been fixed
- There is a qemu issue, probably with multipath only which seem to not always
  be triggered (always via openqa setup? No idea, I only could run into it once
  out of maybe 5 kvm multipath installations and when I run into it the system
  was set up by Max Lin in the "openqa" way.

I close this bug and will reopen anotherone and assign it to virtualization/qemu/kvm maintainers. And before I have a quick chat with Alex Graf.., maybe he already has an idea...

Thanks a lot Dominique and Max! You helped me a lot getting a bit used to the openqa stuff which I like to know a bit better. This is a cool tool..