|
Bugzilla – Full Text Bug Listing |
| Summary: | openSuse 42.2: Boot on broken RAID 1 with missing disk fails. Looks linked to dracut premount hook script (start job running with nolimit timeout) | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Cedric Simon <cedric> |
| Component: | Basesystem | Assignee: | Daniel Molkentin <daniel> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | 9b3e05a5, arvidjaar, comes, forgotten_9PBoJLuOmu, harald, hare, mchang, meissner, nfbrown, rolf.schmidt, trenn |
| Version: | Leap 42.2 | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | openSUSE 42.2 | ||
| Whiteboard: | |||
| Found By: | Community User | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Boot screen
Full boot screen GRUB2 command line specifying RAID devices Boot screen after specifying RAID devices in GRUB2 line |
||
Created attachment 708147 [details]
Full boot screen
Other possibility: dracut script is fine but waitdev is defaulted to 0 (no limit) in openSuse 42.2 instead of 90 (like in previous versions). I am gone cry.... Looks like 2nd (empty) disk and waitdev parameter do NOT solve. I think that some way I got the array broken and did test with a (already broken) RAID disk... So issue is still present: I am not able to boot completely openSuse 42.2 with a system on RAID1 with missing, new, or damaged disk. I also tested specifying RAID devices on command line: linux .... root=/dev/md1 md=1 /dev/sda2,/dev/sdb2 (see picture), but did not help. Created attachment 708155 [details]
GRUB2 command line specifying RAID devices
Created attachment 708156 [details]
Boot screen after specifying RAID devices in GRUB2 line
Posted on UnLinux StackExchange too. http://unix.stackexchange.com/questions/334197/opensuse-42-2-unable-to-boot-when-root-is-on-raid1-broken-with-one-missing-new The problem seems to be this patch Fri Oct 14 14:20:51 CEST 2016 - hare@suse.de - 90mdraid: Use stock MD rules to assemble RAID arrays (bsc#998860) *add 0313-90mdraid-Use-stock-MD-rules-to-assemble-RAID-arrays.patch Stock MD rules require /usr/lib/systemd/system/mdadm-last-resort@.{service,timer} to trigger start of degraded array. These are missing in initrd. Workaround is to add /etc/dracut.conf.d/mdadm.conf with content install_optional_items+=" /usr/lib/systemd/system/mdadm-last-resort@.service /usr/lib/systemd/system/mdadm-last-resort@.timer " and rebuild initrd. If system is already stuck, boot with rd.break=pre-mount and manually run "mdadm --manage --run /dev/mdX" for all mdX (root and resume). You may need to do it once more after boot for all other arrays after entering emergency mode. This is actually upstream bug. Files are missing upstream as well. @Andrei Borzenkov: you are my hero! I confirm the workaround (add /etc/dracut.conf.d/mdadm.conf as stated) fixes the problem. Many thanks for pointing it to me. I lost almost a week trying to figure out what was exactly wrong... So this is actually a dracut issue. Thomas, can you handle it? Daniel Molkentin handles dracut issues nowadays. https://build.opensuse.org/project/show/home:dmolkentin:dracut:1017695 contains a proposed fix. Please let me know if it works for you. Since I already applied manually the fix proposed by @Andrei Borzenkov and it solved the issue, I can't test your fix. But if it implements was was proposed, it must work ;) I have tested the rpm from https://build.opensuse.org/project/show/home:dmolkentin:dracut:1017695 and it fixes the problem. A PC with a raid1 root disk boots successfully even with a missing disk. Verified & pushed towards Leap. Backport to SLE/openSUSE in progress. Closing. SUSE-SU-2017:0641-1: An update that solves one vulnerability and has 6 fixes is now available. Category: security (moderate) Bug References: 1005410,1006118,1007925,1008340,1017695,986734,986838 CVE References: CVE-2016-8637 Sources used: SUSE Linux Enterprise Server 12-SP1 (src): dracut-037-91.1 SUSE Linux Enterprise Desktop 12-SP1 (src): dracut-037-91.1 openSUSE-SU-2017:0708-1: An update that solves one vulnerability and has 6 fixes is now available. Category: security (moderate) Bug References: 1005410,1006118,1007925,1008340,1017695,986734,986838 CVE References: CVE-2016-8637 Sources used: openSUSE Leap 42.1 (src): dracut-037-80.1 SUSE-SU-2017:0951-1: An update that solves one vulnerability and has 10 fixes is now available. Category: security (moderate) Bug References: 1005410,1006118,1007925,1008340,1008648,1017141,1017695,1019938,1020063,1021687,902375 CVE References: CVE-2016-8637 Sources used: SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src): dracut-044-108.1 SUSE Linux Enterprise Server 12-SP2 (src): dracut-044-108.1 SUSE Linux Enterprise Desktop 12-SP2 (src): dracut-044-108.1 OpenStack Cloud Magnum Orchestration 7 (src): dracut-044-108.1 *** Bug 1025726 has been marked as a duplicate of this bug. *** *** Bug 1033223 has been marked as a duplicate of this bug. *** SUSE-SU-2017:2696-1: An update that solves one vulnerability and has 11 fixes is now available. Category: security (moderate) Bug References: 1005410,1006118,1007925,1008340,1008648,1017695,1032576,1035743,935320,959803,986734,986838 CVE References: CVE-2016-8637 Sources used: SUSE Linux Enterprise Server 12-LTSS (src): dracut-037-51.31.1 |
Created attachment 708146 [details] Boot screen Here is the case: I have 2 disks setup in RAID 1 as follow: /dev/md0 (/dev/sda1 + /dev/sdb1) --> /boot /dev/md1 (/dev/sda2 + /dev/sdb2) --> / My Scenario: 1) Start server with RAID1 and array fully synchronized 2) Shutdown my server normally 3) Remove one of the disk (/dev/sdb) 4) Try booting again with just 1 disk My scenarion is working well on openSuse 11.3 --> 42.1 On openSuse 42.2, when I try to boot with 1 disk only, GRUB load fine (while also located on an array), but later on it waits forever for the root (/dev/md1) device. I would expect it to brake the array and go on. I tested it with new install of 42.2 from DVD (with and without zypper update), as well as online distro upgrade from 42.1. Same issue in all scenarios. If array was already broken before shutdown, it does boot well (with broken array). After several days testing different possible solutions I found a workaround: add "waitdev=30" to linux GRUB boot command line. I also noticed that adding a new (empty) disk also solve the problem (it starts with broken array). Problem seems to be located in dracut-pre-mount script. This is critical for me since I have already had several issues where "hard shutdown" (electrical (battery) power failure) damages a disk, leading to a similar scenario as the one described above. For the moment I will use the "waitdev=n" parameter, but I would prefer to have this bug fixed. Well at least I leave it documented hoping it can save someone else time...