|
Bugzilla – Full Text Bug Listing |
| Summary: | mdadm -- incremental | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.2 | Reporter: | Petr Matula <petr.m> |
| Component: | Basesystem | Assignee: | Neil Brown <nfbrown> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P3 - Medium | CC: | adebex, fcrozat, herbert, martin |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 12.2 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
systemd logging output
systemd logging output (systemd from fcrozat repo) /etc/fstab screenshot |
||
|
Description
Petr Matula
2011-11-29 11:20:36 UTC
Same on my fresh installation. /sbin/mdadm is missing in the initrd. Maybe this is caused by the fact that there are some MD raid partitions on this system, but they are not mounted by this 12.1 installation. I can confirm this bug, it's quite annoying because it makes OS unable to boot without manual intervention. First of all, screenshot: http://tinypic.com/view.php?pic=t6a5w0&s=5 And now little description: I've got PC with 3 disks: [sda] & [sdb] are 1TB drives serving as RAID1 volume. [sdc] is a SSD system disk. When using systemD boot fails with the previously mentioned errors and systemD waits for user input - after pressing ^D system starts properly and the aforementioned RAID1 volumes are mounted properly. However, if I comment out relevant /etc/fstab lines, openSuse starts properly. When using systemV style boot system starts without any manual intervention, but RAID1 volumes aren't mounted. I would consider it a serious regression, because exactly the same configuration worked fine in opensuse 11.4 and 11.3. Reproducible: Always. adrian@adrian-pc:~> cat /etc/fstab (filtered) # root on /dev/sdc UUID=b181c57c-a5f1-414e-aa8a-4814fdb59301 / ext4 discard,noatime,acl,user_xattr 1 1 # boot on /dev/sdc UUID=3bd1bce9-4d52-4fef-a58d-9101b5db6a61 /boot ext4 discard,noatime,acl,user_xattr 1 2 # data on /dev/md126p3 UUID=027394e0-ddf1-48b9-9503-4c662fdcd061 /media/dane ext4 noatime,user,exec,acl,user_xattr 0 0 # windows on /dev/md126p2 /dev/disk/by-id/md-uuid-02503092:fe8e8b7c:06366497:fecfe6ae-part2 /media/windows ntfs-3g noatime,users,uid=adrian,gid=users,fmask=133,dmask=022,locale=pl_PL.UTF-8 0 0 Thanks for the report. I'm not sure how to proceed with this. I probably need to find someone with a good understanding of systemd.... We only include mdadm on the initrd if it will be needed to mount '/', '/usr' or swap. This is not the case for any of the systems described here as far as I can tell, so not including mdadm is correct. The intention is that initrd should just mount enough to get the root filesystem mount and the the scripts etc on the root filesystem get everything else mounted. Once '/' is mounted, "/sbin/udevadm trigger' is run (by /etc/init.d/boot.udev) to cause all devices to be 'discovered' again. This time mdadm will exist and the arrays will be created instead of reporting that 'mdadm does not exist'. Maybe systemd has the wrong idea about the ordering between mounting everything else in /etc/fstab, and running "udevadm trigger" or something. Frederic: can you shed any light on the systemd behaviour here? Thanks. first, everybody should test package from home:fcrozat:systemd / systemd, it contains fixes which will go to maintenance update and some of those fixes might have restarted lvm service when not needed. anyway, systemd doesn't run boot.udev, it has its own udev service (which does udevadm trigger). And fstab mounting will only happen after udev trigger has been done and device node appeared in udev (but by default, waiting for udev settle is not enabled, unless lvm / dmraid / md have been enabled). to better debug this problem, please boot with systemd.log_level=debug systemd.log_target=kmsg and attach dmesg output Created attachment 467052 [details]
systemd logging output
Created attachment 467057 [details]
systemd logging output (systemd from fcrozat repo)
I'm not very versed in how we handle mdadm in initrd but it looks like we are doing the work twice : one time in boot-md.sh and another time in the udev rules which are also present in initrd ( /lib/udev/rules.d/64-md-raid.rules ). That could explain why udev is complaining during initrd. But once the initrd is done, when udev is restarted, it "thinks" it has handled md devices during initrd (using its own rules) and doesn't redo its work (which would explain why it tried to mount RAID devices directly, without waiting for udev to assemble them). Is it the same problem? https://bugzilla.novell.com/show_bug.cgi?GoAheadAndLogIn=1&id=733299 if bug 733299 is during the first phase of the installation (ie partitionning, choosing package, installation of packages, etc..), no, it is not. I think this might be the same as bug #738588 Please test the mix mentioned in that bug and re-open this bug if the problem persists. Thanks. *** This bug has been marked as a duplicate of bug 738588 *** Negative - this is a different bug. 1. I've tried updated systemd from Frederic repo and it didn't help. 2. Partition layout mentioned in description is also different: I have system my / partition mounted on normal disk (not md array) and only my data disks are placed on md array - the other bug mentions about root partition being placed on md array. 3. Also, like I mentioned in my first comment: "When using systemV style boot (selected in grub menu) system starts without any manual intervention, but RAID1 volumes aren't mounted." just for the record, you can switch back to systemd from update repository (no need to use the package from my own repository anymore), it contains the fixes from bnc#738588 Any hope in fixing this? It's mighty annoying, so I'm considering reverting to 11.4 or something like that (just my rant, no offence). I offer to test to be a beta-tester for any kind of solution You may think of - or at least please point me at things I should check to help fixing this. I've been a bit swamped lately, but I'll try looking again early next week. Just pinging the topic. Has anyone tried looking at this? I'd also like to add that this bug isn't 100%, but more like 99% repeatable - i think few times system was able to boot properly. (sorry... things slowly getting back to normal here).
So just to make sure I understand:
1/ There are md arrays in the system, but they are not used to store '/'
Is that correct?
2/ You get error messages during boot like those at the top
... failed to execute .. mdadm ...
Correct?
3/ It still fails to boot (mostly) ??? I wonder why.
What are the last messages?
4/ What are the contents of /etc/mdadm.conf ??
I guess systemd is trying to mount /media/dane and /media/windows and is failing because the md arrays haven't been assembled yet.
Sounds a lot like bug 738588, but that has been fixed.
Confused... maybe the answer to one of the above questions might help me.
Thanks.
hmm, I might have a clue on why the fix from bug 738588 does not work. Adrian, could you try changing in /etc/fstab the last parameter for /media/dane from 0 to 2 ? this will ensure fsck is run on this partition. The fix for bug 738588 was adding a dependency from fsck on md.service, so if we don't put a fsck dependency for this partition, it might not be waiting for md.service to be complete. Of course, it would be better if udev was not announcing the partition as "available" but that is another story ;) @Frederic: You were right, changing fstab did help! Booted twice to be sure :-) @Neil: 1. Yes 2. Yes 3. Without Frederic workaround, yes, it still fails. Messages are the same like in the screenshot I posted in my first comment https://bugzilla.novell.com/show_bug.cgi?id=733283#c2 4. adrian@adrian-pc:~> cat /etc/mdadm.conf DEVICE containers partitions ARRAY metadata=imsm UUID=ec37a38d:5f1dccc0:cbc99ca5:5cf87e22 ARRAY /dev/md/Volume0 container=ec37a38d:5f1dccc0:cbc99ca5:5cf87e22 member=0 UUID=02503092:fe8e8b7c:06366497:fecfe6ae Thanks! I've pushed a fix for this bug. Please test package (systemd >= 37-3.146) from http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ (of course, you need to change back /etc/fstab to use 0 as last field ;) Yes, it did help. Now system boots just fine. Do you have time to explain to illiterate user what bug, and the fix was? (just my curiosity...) Thank You very much for help. we just ensure md.service is complete before starting the mount part (when fsck was enabled, it was ensure before fsck and therefore mount was done, but not in your case). This is an autogenerated message for OBS integration: This bug (733283) was mentioned in https://build.opensuse.org/request/show/104941 Factory / systemd Well, this seems to be fixed - thanks Frederic - so I'll mark it as Resolved. This is an autogenerated message for OBS integration: This bug (733283) was mentioned in https://build.opensuse.org/request/show/106032 12.1 / systemd I'm reopening this against 12.2 as I was just greeted by the message in the description (for the second time; the system isn't rebooted that often). Logging in as root and exiting with ^D lead to a normal boot. The system has recently been upgraded from 11.4 to 12.2 via zypper dup; the problem started in 12.2. I've attached my fstab. Created attachment 517093 [details]
/etc/fstab
This seems to be some race condition in systemd: it worked on the second reboot. I'm attaching a screen shot of a failed boot. Created attachment 517251 [details]
screenshot
(It's usually best to open a new bug if the old one is closed. Because if the old one was fixed, then presumably it is a different bug. Referencing the old bug that you suspect might be related would be a good idea of course). Can you please describe your configuration. I'm guessing there is some LVM in there as well as filesystems and RAID. Maybe it is LVM that is complicating the situation for you. The bug was definitely not fixed for 12.2. On my system, I have raid partitions and mkinitrd fails to add mdadm to the initrd. LVM is not used.
md126 : active raid1 sdb3[1] sda3[0]
14659200 blocks [2/2] [UU]
During boot, udev detects the raid partitions, and calls (as specified in lib/udev/rules.d/64-md-raid.rules) "/sbin/mdadm --incremental $tempnode --offroot", which fails, as mdadm is not within the initrd.
The fstab entry for the raid is:
LABEL=RAID1 /mounts/RAID1 ext4 defaults 1 2
When installing 12.2, the raid partitions were already there and I created the fstab entry manually (not with Yast or during installation).
Herbert: apart from the error messages, are you actually experiencing an failure? Martin: thanks for the screen shot. It looks like systemd has some dependency wrong. It complains that it cannot mount some filesystems because there is a dependency on devices that don't exist, and then those devices are created by LVM. Could you please attach your /etc/fstab? Fredrick - could you please look at this and see if you can suggest anything? The /etc/fstab is already attached. :-) The system uses LVM for everything but /boot; /raid* is a lvm on raid5. c(In reply to comment #31) > Herbert: apart from the error messages, are you actually experiencing an > failure? Not having mdadm in initrd is clearly not a systemd bug ;) > Martin: thanks for the screen shot. It looks like systemd has some dependency > wrong. It complains that it cannot mount some filesystems because there is a > dependency on devices that don't exist, and then those devices are created by > LVM. Could you please attach your /etc/fstab? > > Fredrick - could you please look at this and see if you can suggest anything? I would need output of : systemctl status raid-nelson-root.mount and systemctl show raid-nelson-root.mount getting mdadm + lvm into systemd / udev is very tricky and some dependencies might have been missed (In reply to comment #31) > Herbert: apart from the error messages, are you actually experiencing an > failure? Only, if MDADM_DEVICE_TIMEOUT=0 is used: /etc/init.d/boot.md cannot assemble the arrays, as udev is accessing the partitions. If mdadm would be within the initrd, the raid could be assembled without any problems. (sorry for the delay). What errors do you get from boot.md which suggests it cannot assemble the arrays? and how do you know that udev is accessing the partitions at that time? It would still help to get the "systemctl status" and "systemctl show" output that Frederic requested. no response for 2 months so closing. Please reopen if this is still a problem and the above requested information can be provided. (In reply to comment #33) > > Martin: thanks for the screen shot. It looks like systemd has some dependency > > wrong. It complains that it cannot mount some filesystems because there is a > > dependency on devices that don't exist, and then those devices are created by > > LVM. Could you please attach your /etc/fstab? > > > > Fredrick - could you please look at this and see if you can suggest anything? > > I would need output of : > systemctl status raid-nelson-root.mount > and > systemctl show raid-nelson-root.mount > > getting mdadm + lvm into systemd / udev is very tricky and some dependencies > might have been missed Sorry for me being so late; I rarely reboot the machine. :-) But I recently rebooted it many times in a row (with the latest 12.2 kernel etc.) and had no problems, so I call the problem fixed. |