|
Bugzilla – Full Text Bug Listing |
| Summary: | boot on raid device is not started if degraded; fix provided | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Forgotten User VB1HhTwhLY <forgotten_VB1HhTwhLY> |
| Component: | Other | Assignee: | Neil Brown <nfbrown> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | arvidjaar, hpj, jjletho67-esus, mchang, mmarek, ohering, trenn |
| Version: | 13.2 | ||
| Target Milestone: | 13.2 RC 1 | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 13.2 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
patch for /var/mkinitrd/scripts/, maybe not src files
partition layout mdadm RPM for testing rdosreport.txt After a failed boot, taken from the emergency shell (initramfs phase) disk layout patch that helps. |
||
Oh and by the way, maybe my fix would also fix this bug too: https://bugzilla.novell.com/show_bug.cgi?id=823125 and also, for unknown reasons, if in the rescue shell you run "mdadm --run /dev/md0" then "systemctl default", then reboot, it will boot normally again, even though it is still degraded. Thanks for the report. I'm trying to reproduce and understand the issue. I can not reproduce it. I installed a VM with two disk and the layout as described above. After install, the VM was shut down, one disk removed, and the system starts ok with just that one disk. I ran mkinitrd and compared the old and new initrd. There are no differences. mdadm.conf exists and did not change. In case you want to poke around, hammer175 is the VM, password is root. Can I look at the VM? How do I connect to the machine? (In reply to comment #0) > If your /boot is on a separate raid device from your /, mkinitrd does not add > any information in the initrd to start the raid device, so boot will fail. Hi Peter, I'm a bit confused by this part of the problem description. As I understand it, the initrd does not need to access the /boot filesystem at all. The boot loader (e.g. grub) does of course so that it can load the kernel and the initrd, But all the initrd need access to is the root filesystem and the swap partition. Once it mounts root, the scripts in there will take over to mount /boot and anything else. Clearly you are having a problem and it does seem to be related to the md device containing /boot, but I think it needs to be fixed in the regular boot scripts, not in the initrd. Handling freshly degraded arrays at boot is somewhat tricky with the dependency driven boot sequence that systemd uses. As devices are discovered, udev runs "mdadm -I $DEVICE" and mdadm incrementally assembles the arrays. Once all components are there the array is started. But if all components never arrive, the array will never be started with just that mechanism. To address this you can run "mdadm -IRs" which essentially says "all devices have arrived, time to start any remaining md arrays which are degraded". systemd need to do this when it times out waiting for a device. But I don't know how to tell it (not that I have really looked recently). The initrd does have a call to "mdadm -IRs" half way through timing out for the root device. This is why your boot works if you tell the initrd to assemble the boot device. But that isn't really the right fix. I'll do some reading about systemd and see if I can figure out how to give it an action to perform on timeout. HI, I think I'm facing an almost identical problem. The only main difference is that on my system the emergency shell never appears. I'm attaching a file in which you can find (in this order) : partition layout, mdstat, mdadm --detail on all raid devices,vgdisplay, lvdisplay, fstab and active mount point. When i start the system with /dev/sdb pulled out systemd complains about missing /boot with those error messages on the console: " Timed out waiting for device dev-disk-by\x2dlabel-bootFS.device Dependency failed for /boot Dependency failed for Local File Systems Welcome to emergency mode! After Logging in type "journalctl -xb" to view system logs "systemctl reboot" to reboot, "systemctl default" to try again to boot into default mode " The last line is repeated on the screen about every 60 seconds, but no keyboard inputs are accepted and no login prompt appears. I can only switch to and from alt-F1 and alt-F7 console. Created attachment 569482 [details]
partition layout
Yes, that looks like the same problem. I have a fix nearly worked out. I should have something for you to test early next week. I don't know why you don't get a login prompt, but it might be worth trying to boot with plymouth - that might be confusing things. I think you try 'e' to the grub menu and it puts you in a simple editor. Find the kernel command line and add plymouth.enable=0 to the end. See if that provides a password prompt in emergency mode. (In reply to comment #9) > I don't know why you don't get a login prompt See bnc#852021
> I don't know why you don't get a login prompt, but it might be worth trying to
> boot with plymouth - that might be confusing things.
>
> I think you try 'e' to the grub menu and it puts you in a simple editor. Find
> the kernel command line and add
> plymouth.enable=0
> to the end. See if that provides a password prompt in emergency mode.
I added plymouth.enable=0 but the emergency shell is still not working
(In reply to comment #10) > (In reply to comment #9) > > > I don't know why you don't get a login prompt > > See bnc#852021 It looks very similar! I'm going to try the suggested patch as soon as possible and I'll let you know the result, thank you. (In reply to comment #12) > (In reply to comment #10) > > (In reply to comment #9) > > > > > I don't know why you don't get a login prompt > > > > See bnc#852021 > > It looks very similar! I'm going to try the suggested patch as soon as possible > and I'll let you know the result, thank you. Ok the emergency shell problem is the same described in bnc#852021 and the patch proposed has worked for me. (in the sense that i solved the login prompt problem, but of course i still have the main problem we are facing here) I'm of course available to test a patch Created attachment 569764 [details]
mdadm RPM for testing
Please test this rpm and confirm that it fixes the problem.
If you boot with not all expected devices present there will be a 30 second delay waiting for device to appear, then any md arrays which can be started degraded will be. On subsequent boots the 30 second delay will not be needed as the other devices are not longer expected.
(In reply to comment #14) > Created an attachment (id=569764) [details] > mdadm RPM for testing > > Please test this rpm and confirm that it fixes the problem. > > I installed the rpm with this command: rpm -ivh --replacepkgs --force --force was necessary otherwise rpm complains that the already installed package is newer than the one i was installing. I added the nofail option to all mounted standard (NO RAID) partitions on the disk that i was about to remove. (the absence of a partition which is indicated in fstab as automounted triggers the emergency shell) I pulled out the sdb disk and the system booted as expected! So the patch is working fine for me! Hi Peter, I've been approach this as an openSUSE-13.1 problem. However I just noticed that the affected "Product" at the top says "openSUSE-12.3". 12.3 works quite differently in this area and I think that it works correctly. Could you please confirm whether you we seeing this in 12.3 or in a 13.1 beta? Thanks. This is an autogenerated message for OBS integration: This bug (832501) was mentioned in https://build.opensuse.org/request/show/209450 Factory / mdadm > Could you please confirm whether you we seeing this in 12.3 or in a 13.1 beta?
@Neil
It was 12.3; I have not tested 13.1 at all yet.
I confirm the bug is present in 13.1 and NOT present in 12.3. I tested the patch in 13.1 Hi Peter, I've tried to reproduce this on 12.3 and I cannot. It always boots with /boot found and mounted happily. I can only assume that there is some detail in you configuration which is different to the way the installer sets things up. I would definitely advise having all the arrays listed in /etc/mdadm.conf, though I find it works even without that. Also it is best if /etc/fstab gives /dev/disk/by-id/md-uuid-..... devices rather than e.g. /dev/md127. Doing that latter could possibly confuse thing. So I'm going to focus on getting a good fix for this into 13.1 and leave it at that. Update released for openSUSE 13.1. Resolved fixed. openSUSE-RU-2013:1883-1: An update that has two recommended fixes can now be installed. Category: recommended (low) Bug References: 832501,851993 CVE References: Sources used: openSUSE 13.1 (src): mdadm-3.3-4.4.1 I think the bug is present again in 13.2 RC1. I was unable to boot with a degraded raid 1 array (both boot and root where on raid) I'm attaching the rdosreport.txt. The strange thing is that , from the very limited emergency shell That come out in the initramfs phase, I can correctly see the degraded array and the lvm logical volume on Which root filesystem is.file:///boot/rdsosreport.txt Created attachment 610152 [details]
rdosreport.txt After a failed boot, taken from the emergency shell (initramfs phase)
(In reply to Marco M. from comment #23) > I was unable to boot with a degraded raid 1 array (both boot and root where > on raid) > I cannot reproduce this using boot on MD RAID with current Factory. Your log shows [ 143.781288] linux-m61d dracut-initqueue[270]: Warning: Cancelling resume operation. Device not found. [ 144.094926] linux-m61d systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-1f3d0926\x2da660\x2d45a6\x2da2b0\x2dbfdf165d64b5.device. [ 144.096300] linux-m61d systemd[1]: Dependency failed for /sysroot. [ 144.098235] linux-m61d systemd[1]: Dependency failed for Initrd Root File System. [ 144.099596] linux-m61d systemd[1]: Dependency failed for Reload Configuration from the Real Root. [ 144.100150] linux-m61d systemd[1]: Dependency failed for File System Check on /dev/disk/by-uuid/1f3d0926-a660-45a6-a2b0-bfdf165d64b5. [ 144.553192] linux-m61d dracut-initqueue[270]: Scanning devices md2 for LVM logical volumes SystemVG/rootLV So it is actually problem of LVM on RAID, not RAID itself. /dev/mapper/SystemVG-rootLV: LABEL="rootFS" UUID="1f3d0926-a660-45a6-a2b0-bfdf165d64b5" TYPE="ext4" Please provide your initrd that fails as already requested. Created attachment 610281 [details]
disk layout
Just to clarify my partitions layout I attach a text file with:
disks partition tables
mdadm --detail of all raid array
pvdisplay, vgdisplay and lvdisplay
fstab
I can't directly attach initrd file because it exceeds the max size allowed. I'm going to share the file with some other file sharing system.
https://copy.com/zDQdfUM0L4lJz4fo with this link you can download the initrd file Strange... The rdosreport.txt log reports: [ 144.094926] linux-m61d systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-1f3d0926\x2da660\x2d45a6\x2da2b0\x2dbfdf165d64b5.device. But also shows /dev/disk/by-uuid: ... lrwxrwxrwx 1 root 0 10 Oct 15 12:12 1f3d0926-a660-45a6-a2b0-bfdf165d64b5 -> ../../dm-0 So it clearly find the device, but maybe not soon enough for systemd. This dm-0 is an LVM partition in md2, which is a degraded raid1. As it is degraded it isn't started immediately - not until /sbin/mdraid_start is called. That is scheduled to run at a timeout by a udev rules file: etc/udev/rules.d/65-md-incremental-imsm.rules:RUN+="/sbin/initqueue --timeout --name 50-mdraid_start --onetime --unique /sbin/mdraid_start" This will be after the normal lvm scan, but there is a secondary lvm scan similarly scheduled by a timeout. etc/udev/rules.d/64-lvm.rules:RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique /sbin/lvm_scan --partial" So the md device should be assemboed by "50-mdraid_start" and the lvm device by "51-lvm_scan", and as "51" comes after "50", this should happen in the right order. The log confirms this: [ 143.256336] linux-m61d kernel: md/raid1:md2: active with 1 out of 2 mirrors ... [ 144.553192] linux-m61d dracut-initqueue[270]: Scanning devices md2 for LVM logical volumes SystemVG/rootLV The problem is that systemd complains between these two: [ 144.094926] linux-m61d systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-1f3d0926\x2da660\x2d45a6\x2da2b0\x2dbfdf165d64b5.device. I guess the systemd timeout must be only a tiny bit longer than the initqueue timeout. The initqueue timeout is 2*$RDRETRY/3 where RDRETRY is 180 (by default). So 120 seconds. The default timeout for devices seems to be 90 seconds - set by the DefaultTimeoutStartSec value. But that doesn't really seem to line up, so I must be missing something. Can you try booting with rd.timeout=180 added to the command line args? If that works, it will at least hint that I'm on the right track. I might try to set something up myself to test ... if I find time. I've can reproduce this bug. It appears to be a problem in dracut, though I still don't fully understand it. There seem to be two timeouts. After one, dracut gives up waiting for enough devices for non-degraded arrays to appear and accepts degraded arrays. This defaults to 120seconds (2/3 of RDRETRY). If I boot with rd.retry=80 then boot with a removed device is successful. The other timeout controls when systemd will give up waiting for the root device to appear This seems to default to 90 seconds, but I guess it starts from a different moment than the other timeout. This timeout seems to fire immediately *after* the md raid devices have been assembled degraded, but just *before* an lvm device is found in the md raid device (which happens about 1 second later). If I edit /etc/systemd/system.conf, uncomment #DefaultTimeoutStartSec=90s and change it to 300s, and run "mkinitrd" then again the boot succeeds, but without the need to set rd.retry=80 Trying a smaller number, like 120s, should be sufficient. But it isn't. I don't understand that. I didn't binary-search to find where the cut-off is .... it takes too long to run a test. So you can make your system work by making the above change to /etc/systemd/system.conf. Thomas: are you the maintainer for dracut? Do you know anything about the timeout for the root device to appear and how it suppose to compare with RDRETRY ??
>
> If I edit /etc/systemd/system.conf, uncomment
> #DefaultTimeoutStartSec=90s
>
> and change it to 300s, and run "mkinitrd" then again the boot succeeds, but
> without the need to set rd.retry=80
>
This workaround has worked fine also for me. I did those two tests:
1) degraded raid1 device with root on it plus one data missing filesystem (a simple partition with a file system on it, no raid involved): root filesystem is mounted and the emergency shell appears.
2) degraded raid1 device with root on it, no other missing filesystem (there was a missing filesystem, but i marked it with the "nofail" tag in /etc/fstab): the system boots correctly in runlevel 5 ( in the systemd equivalent runlevel 5...)
*** Bug 811830 has been marked as a duplicate of this bug. *** Created attachment 626448 [details] patch that helps. I think this is a more robust fix for the problem than fiddling with timeouts. If you cd /usr/lib/dracut patch -p1 < /path/to/dracut.diff mkinitrd then you should have better luck booting with a missing device. I've sent an email discussing the problem to the initramfs mailing list: http://comments.gmane.org/gmane.linux.kernel.initramfs/4075 but has not received a reply yet. Hi, i have just removed the needinfo flag for my user, because i think i have provided the information requested. If you need something else of course I'm available, so please let me know @Neil: on the initramfs mailing list they've answered to you and are waiting for more info. Thanks... I did see that reply, sent the patches properly (in a different thread), they were eventually applied, and an updated 'dracut' was released for openSUSE about a week ago. Can you install dracut-037-17.12.1 and confirm that it fixes the problem? Thanks. Hi, i installed the latest dracut version, removed this line from the systemd.conf DefaultTimeoutStartSec=300s and I re-run mkinitrd After i rebooted and repeated the same test I described in my post on 2014-11-10. (the vm i used today was exactly the same) All worked as expected, the only thing i noticed is that the boot (in both cases) require a very long time (some minutes more than normal). The only open question now is why the yast installer is not able to install grub on both the raid members while installing. (see the duplicated bug here: https://bugzilla.novell.com/show_bug.cgi?id=811830) Thank you all! > some minutes more than normal
When booting with a newly degraded array I would expect an extra delay of 2 minutes. - there is a default timeout of 180 seconds and a magic factor of 2/3 applied.
If you are seeing a longer delay, or a delay when the array was not newly degraded, then that might be a bug. Otherwise it is acting as expected.
The yast installer/grub install issue is quite separate. Your best bet would be to open a new bug focusing on just that issue. Feel free to add me to 'cc', but I'm not likely to be the one to push it to resolution (I hope).
Thanks for the positive report - I'll close this bug now on the assumption that the delays you see match expectations. If they don't and you want to pursue that issue, please re-open.
> If you are seeing a longer delay, or a delay when the array was not newly > degraded, then that might be a bug. Otherwise it is acting as expected. It is working as expected. > > The yast installer/grub install issue is quite separate. Your best bet > would be to open a new bug focusing on just that issue. Feel free to add me > to 'cc', but I'm not likely to be the one to push it to resolution (I hope). > I'll build a new clean test environment and I'll open a new bug as you suggested. Thank you very much |
Created attachment 550422 [details] patch for /var/mkinitrd/scripts/, maybe not src files User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0 If your /boot is on a separate raid device from your /, mkinitrd does not add any information in the initrd to start the raid device, so boot will fail. I don't know why booting works if the RAID is clean. Perhaps systemd is starting it in this case. Ubuntu 12.04 (grub 1.99) can boot with degraded raid as long as you manually fix the metadata version of the device (change to 0.90, possibly 1.0, but not 1.2 which is default on CLI and in Ubuntu installer), so I was sad to see that the latest openSUSE does not work (even though previous versions did work). But I was happy to see that openSUSE will work with my fix and without changing the metatdata, because openSUSE uses grub 2.00 and the installer uses metadata 1.0 instead of 1.2. I have fixed the problem on my machine by editing the mkinitrd scripts. I don't know if I did a nice clean job that will work on other systems, so please validate it. I have also added some extra output in verbose mode. In my solution, I have checked to see if the mdadm.conf exists, and if not, generated one. This is because the openSUSE installer did not generate one for me in my most hackish of tests. I think this seems like a good way to prevent some problems, even if they are the users' fault. In my solution, I am not sure if there is a problem when you have no mdadm.conf or your mdadm.conf has entries for things you don't want to be required for boot, and then the initrd will try to start them too. I did a check in /sys/devices/virtual/block/ to see if there are devices before trying to handle them, and then if there are devices but no mdadm.conf, then I use <(mdadm -D --scan) to read the output instead of the file. Reproducible: Always Steps to Reproduce: Set up a test machine: 2 x 16 GB virtual disks md0 is raid1, sda1 and sdb1, and mounted on /boot as ext4 md1 is raid1, sda2 and sdb2, and is a LVM PV /dev/suse is the LVM VG containing PV /dev/md1 /dev/suse/root is from VG /dev/suse, and mounted on / as ext4 /dev/suse/swap is from VG /dev/suse, and is swap On command line, you could create the devices like this: mdadm --create /dev/md0 -n 2 -x 0 -l 1 -e 1.0 missing /dev/sdb1 mdadm --create /dev/md1 -n 2 -x 0 -l 1 -e 1.0 missing /dev/sdb2 mkfs.ext4 -L boot /dev/md0 pvcreate /dev/md1 vgcreate suse /dev/md1 lvcreate -L 4GB -n swap suse lvcreate -l 100%FREE -n root suse mkfs.ext4 -L root /dev/suse/root mkswap /dev/suse/swap After the machine is up, run this to ensure the machine should be ready to boot with either disk missing: grub2-install /dev/sda grub2-install /dev/sdb mkinitrd grub2-mkconfig -o /boot/grub2/grub.cfg Then shut it down; remove a disk (I removed the 2nd for most of my tests, because virtualbox snapshots mess up if you boot from the one you add afterwards). Then boot it up Actual Results: You get a very long wait (at least 60 seconds) and then you get emergency mode. Normal startup was blocked because fsck could not open /dev/md0; it could not open it because /dev/md0 is started and exists, but is not running (as if --run was not used when assembling). Expected Results: You get a successful boot with degraded arrays. The systemd log shows you something like this: Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job dev-disk-by\x2duuid-a16b10b0\x2dd038\x2d4946\x2dad88\x2d97c0617bbf8c.device/start timed out. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-a16b10b0\x2dd038\x2d4946\x2dad88\x2d97c0617bbf8c.device. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for /boot. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for Local File Systems. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for Remote File Systems (Pre). Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job remote-fs-pre.target/start failed with result 'dependency'. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job local-fs.target/start failed with result 'dependency'. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Triggering OnFailure= dependencies of local-fs.target. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job boot.mount/start failed with result 'dependency'. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for File System Check on /dev/disk/by-uuid/a16b10b0-d038-4946-ad88-97c0617bbf8c. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job systemd-fsck@dev-disk-by\x2duuid-a16b10b0\x2dd038\x2d4946\x2dad88\x2d97c0617bbf8c.service/start failed with result 'dependency'.