|
Bugzilla – Full Text Bug Listing |
| Summary: | Root and boot partitions on a sw raid volume, system cannot boot | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Marco M. <jjletho67-esus> |
| Component: | Installation | Assignee: | Neil Brown <nfbrown> |
| Status: | RESOLVED DUPLICATE | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | arvidjaar, aschnell, forgotten_CxVz4LpaB5, hpj, mchang, nfbrown, snwint |
| Version: | 13.2 RC 1 | ||
| Target Milestone: | 13.2 RC 1 | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 13.2 | ||
| Whiteboard: | SILVER | ||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
y2log captured using the procedure described here: https://en.opensuse.org/SDB:YaST_logging_to_USB_stick_during_installation
/boot/grub2/device.map /etc/default/grub_installdevice /var/log/YaST2/perl-BL-standalone-log YaST2 BootGRUB2.ycp patch to enable Linux MD recognition Screenshot of bootloader settings during installation Screnshot bootloader options without the "enable redundancy " checkbox |
||
|
Description
Marco M.
2013-03-26 19:02:58 UTC
Neil, please have a look at problem 1. The logs show that md0 was correctly created on sda1 and sdb1. /proc/mdstat during installation also looks good. Problem 2 is for Steffen Winterfeldt. Problem 3 could also be for Neil. mdadm was called with chunk=32, but the chunk size is not meaningful for RAID1 (according to mdadm man-page). So maybe not a bug. Problem 4 is completely unclear to me. The Problem 4 description is too short and vague, i'm very sorry. I will investigate further on it and I'll, in case, open a separate bug on it. For now i think we can ignore it (In reply to comment #1) > Problem 2 is for Steffen Winterfeldt. > Note that the only configuration where it is safe to do is embedding grub2 in post-MBR gap (or in case of UEFI where it goes to ESP anyway). Anything else is error prone. I think problem 2 is actually for Michael (grub2-install). (In reply to comment #4) > I think problem 2 is actually for Michael (grub2-install). How can grub-install know that it should install on multiple devices? We can discuss whether "grub-install /dev/sda /dev/sdb" makes sense (IMHO it does not add anything in this case), but YaST2 still has to invoke it with correct parameters. It is YaST2 task to build list of devices where grub2 has to be installed. Does it build such list now? And if it has this list, it can simply call "grub-install /dev/sda", "grub-install /dev/sdb" for every device in list. (In reply to comment #5) > (In reply to comment #4) > > I think problem 2 is actually for Michael (grub2-install). > It is YaST2 task to build list of devices where grub2 has to be installed. Does > it build such list now? Yes we can get such list in pbl passed by ybl but it's broken for Grub2. Legacy grub seems to handle the list well. The answer to problem 2 is definitely that 'chunk size' isn't meaningful for RAID1, so it is ignored. The chunk size you see listed as 64M is the "bitmap chunk size", which is how much of the array corresponds to each bit in the write-intent bitmap. It is unfortunate that we used the name "chunk size" for both :-( Problem 1 is strange. I can't see a good reason why one array would come through degraded while the others survived. Have you tried multiple times or did this only happen once? I'll see if I can reproduce it. If I can it shouldn't be hard to isolate the problem. (In reply to comment #7) > The answer to problem 2 is definitely that 'chunk size' isn't meaningful for > RAID1, so it is ignored. > The chunk size you see listed as 64M is the "bitmap chunk size", which is how > much of the array corresponds to each bit in the write-intent bitmap. It is > unfortunate that we used the name "chunk size" for both :-( You mean problem 3 I suppose :-) OK so i can safely ignore chunk size for raid 1 volumes > > Problem 1 is strange. I can't see a good reason why one array would come > through degraded while the others survived. Have you tried multiple times or > did this only happen once? > I'll see if I can reproduce it. If I can it shouldn't be hard to isolate the > problem. I reproduced the problem at least 10 times. I've done my tests using vmware. Nothing let me think the problem is hardware related so i think vmware for testing should be ok, but i'm eventually available to try an installation on a real hardware if you think it is necessary. I just tried an install from the 12.3 dvd with your partitioning in a KVM instance, and it worked perfectly :-( Looking at the "--examine" output it strongly suggests that initial resync did complete. In any case, the only way that grub files could be missing from sda1 is if sda1 were removed before the grub files were written. The most likely explanation is that sda1 reported a write error sometime before the grub files were installed onto /boot. Maybe if there are kernel logs in the same place as the yast logs, they might show something. Possibly yast could check that no arrays are degraded before rebooting... I think Problem 2 - grub only being install on one MBR, not both, is the main problem here. So reassigning to Michael for that. (In reply to comment #9) > I just tried an install from the 12.3 dvd with your partitioning in a KVM > instance, and it worked perfectly :-( :-( this is completely unexpeted to me! So vmware is involved in the problem! I'm very sorry, i was absolutely sure the problem was not hw related. In the next few days i'll do a test on a real hardware (or on kvm) and i'll post the results. In the meantime i did a test with opensuse 12.2 (always on a vmware vm) and an almost identical partition layout. This time i haven't had any problem with the array (it was well created and it was fully available at the first reboot). I'll try to examine logs in the hope to find some significative differences The problem number 2 instead (grub2 only installed onto sda MBR) was present in an identical way. Do you think it would be useful if i capture yast log also for opensuse 12.2 ? Should i open a different bug for this ? (In reply to comment #10) > The problem number 2 instead (grub2 only installed onto sda MBR) was present in > an identical way. I couldn't reproduce this problem when testing perl bootloader of this pull request. https://github.com/openSUSE/perl-bootloader/pull/16/commits Using sw raid1 created by mdadm on /dev/vda1 and /dev/vdb1, grub2 was successfully installed to /dev/vda and /dev/vdb respectively. Those commits are not likely to fix this problem directly so I'm surprised about the result. (In reply to comment #11) > I couldn't reproduce this problem when testing perl bootloader of this pull > request. > > https://github.com/openSUSE/perl-bootloader/pull/16/commits > What's in your /boot/grub2/device.map and /etc/default/grub_installdevice? Created attachment 538494 [details] /boot/grub2/device.map requested by comment#12 Created attachment 538495 [details] /etc/default/grub_installdevice requested by comment#12 Created attachment 538496 [details]
/var/log/YaST2/perl-BL-standalone-log
perl bootloader log for reference.
(In reply to comment #14) > Created an attachment (id=538495) [details] > /etc/default/grub_installdevice > The main difference is that your grub_installdevice contains already /dev/vda and /dev/vdb and grub_installdevice of reported contains just (hd0). But perl-Bootloader does not change this file - so I guess something has changed in YaST2 bootloader module. (In reply to comment #16) > The main difference is that your grub_installdevice contains already /dev/vda > and /dev/vdb and grub_installdevice of reported contains just (hd0). But > perl-Bootloader does not change this file - so I guess something has changed in > YaST2 bootloader module. It could be. My impression is YaST seems to not offer "Enable Redundancy for MD Array" loader location options for grub2 before but this time I notice it appeared in yast2 bootloader. But as always my memory could serve me wrong .. (In reply to comment #17) > It could be. My impression is YaST seems to not offer "Enable Redundancy for MD > Array" loader location options for grub2 before but this time I notice it > appeared in yast2 bootloader. > > But as always my memory could serve me wrong .. I made installation of 12.3 on MD1 with default settings and bootloader was installed on the first disk only and grub_installdevice contained (hd0) only. Looking in y2log: 2013-05-09 02:43:56 <1> 10.0.2.15(2962) [YCP] BootCommon.ycp:1252 boot_md_mbr includes 2 disks: ["/dev/sda", "/dev/sdb"] ... 2013-05-09 02:43:56 <1> 10.0.2.15(2962) [YCP] bootloader/routines/lib_iface.ycp:136 Storing global settings $["activate":"true", "append":" nomodeset resume=/dev/md0 splash=silent quiet showopts", "append_failsafe":"showopts apm=off noresume nosmp maxcpus=0 edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe", "boot_boot":"false", "boot_extended":"false", "boot_mbr":"true", "boot_root":"false", "default":"0", "distributor":"openSUSE 12.3", "generic_mbr":"true", "gfxmode":"auto", "gfxtheme":"/boot/grub2/themes/openSUSE/theme.txt", "os_prober":"true", "terminal":"gfxterm", "timeout":"8", "vgamode":""] boot_md_mbr disappeared. This is due to this modules/BootCommon.ycp:Save if (VerifyMDArray()) { if ((enable_md_array_redundancy != true) && (haskey(my_globals, "boot_md_mbr"))) my_globals = remove(my_globals, "boot_md_mbr"); if ((enable_md_array_redundancy == true ) && (!haskey(my_globals, "boot_md_mbr"))) my_globals["boot_md_mbr"] = BootStorage::addMDSettingsToGlobals(); } else { if (haskey(globals, "boot_md_mbr")) my_globals = remove(my_globals, "boot_md_mbr"); } enable_md_array_redundancy is set only by UI as far as I can tell and UI is present in GRUB only. ./include/bootloader/grub/options.ycp: BootCommon::enable_md_array_redundancy = (boolean)UI::QueryWidget(`id("enable_redundancy"), `Value); From perl-Bootloader side it should work already, but YaST2 part is missing. Created attachment 538562 [details]
YaST2 BootGRUB2.ycp patch to enable Linux MD recognition
BootGRUB2.pm did not initialize Linux MD state (see first attached patch). I do not know whether it was intentional. This patch does enable processing of both array members, but due to the way it is implemented it is *extremely* dangerous. pbl just calls grub2-install two times. If grub2 could be embedded in both cases, that's OK. If grub2 could *not* be embedded, openSUSE forces it to use blocklists. Second invocation of grub2-install will recreate core.img, rendering grub2 on first drive unbootable.
I'm still having this on my TODO list, but I still do not have clean way to support multiple install devices. Or, better said - it requires much more efforts than I want to spend on it.
What is worse, even if we remove "--force" parameter, second grub2-install invocation still recreates core.img potentially rendering grub2 on the first disk unbootable.
I'll get a look once more what can be done here.
(In reply to comment #19) > dangerous. pbl just calls grub2-install two times. If grub2 could be embedded > in both cases, that's OK. If grub2 could *not* be embedded, openSUSE forces it > to use blocklists. Correction - grub2 should refuse installation if it cannot be embedded even with "--force --skip-fs-probe" as long as /boot/grub2 is on MD raid. (In reply to comment #19) > Created an attachment (id=538562) [details] > YaST2 BootGRUB2.ycp patch to enable Linux MD recognition The patch looks good to me. Without it you'll have to go the the yast bootloader setting page, see redundancy array enabled AND you must click the OK to save it to true to get the redundancy really work. Otherwise it will be regular, non redundancy installation although the UI default says it's enabled, it's underlying is a uninitialized nil value. I was fooled by the buggy behavior during my testing. > > BootGRUB2.pm did not initialize Linux MD state (see first attached patch). I do > not know whether it was intentional. This patch does enable processing of both > array members, but due to the way it is implemented it is *extremely* > dangerous. pbl just calls grub2-install two times. If grub2 could be embedded > in both cases, that's OK. If grub2 could *not* be embedded, openSUSE forces it > to use blocklists. Second invocation of grub2-install will recreate core.img, > rendering grub2 on first drive unbootable. We should distinguish the options in perl bootloader. Believe or not it's not as easy as it should be ... $ MBR : <NO OPTIONS> $ BOOT AND ROOT Partition : --force $ EXTENDED : --force --skip-fs-probe $ CUSTOM : <DETECT and pick the best> > > I'm still having this on my TODO list, but I still do not have clean way to > support multiple install devices. Or, better said - it requires much more > efforts than I want to spend on it. > > What is worse, even if we remove "--force" parameter, second grub2-install > invocation still recreates core.img potentially rendering grub2 on the first > disk unbootable. Why? Isn't core.img was embedded already to first disk ? The recreated file should not destroy it. > > I'll get a look once more what can be done here. Thanks. (In reply to comment #20) > (In reply to comment #19) > > dangerous. pbl just calls grub2-install two times. If grub2 could be embedded > > in both cases, that's OK. If grub2 could *not* be embedded, openSUSE forces it > > to use blocklists. > > Correction - grub2 should refuse installation if it cannot be embedded even > with "--force --skip-fs-probe" as long as /boot/grub2 is on MD raid. Agree with you and yes. :) (In reply to comment #22) > > Correction - grub2 should refuse installation if it cannot be embedded even > > with "--force --skip-fs-probe" as long as /boot/grub2 is on MD raid. > > Agree with you and yes. :) I actually mean - grub-install already does it. So I think we can enable this. https://github.com/yast/yast-bootloader/pull/16 (In reply to comment #10) > (In reply to comment #9) > > I just tried an install from the 12.3 dvd with your partitioning in a KVM > > instance, and it worked perfectly :-( > > :-( this is completely unexpeted to me! So vmware is involved in the problem! > I'm very sorry, i was absolutely sure the problem was not hw related. In the > next few days i'll do a test on a real hardware (or on kvm) and i'll post the > results. > > I did a test using kvm and i did NOT suffer the problem n. 1, so Neil's conclusion is definitely double confirmed. Problem N.1 is correlated to vmware. I'm now trying to figure out why opensuse 12.3 has this behavior while opensuse 12.2 works fine on exactly the same virtual hardware. I would like to see the kernel log while the installation goes on, but on the virtual terminal 4 I can see the kernel log only until yast is started. After that nothing more is logged. Could you suggest a way to see kernel log during the whole installation process ? Hi, I experience the same issue with some differences: 2 x 3T discs sda1, sdb1 => md126 /boot sda2, sdb2 => mds127 LVM LVM: /, /home, etc. During the setup at the partition step and the end it complain that there is no /boot partition (well, yes there is one I have set md126 format ext4 and mounted on /boot) I have tried to forget this error and let setup finish (so I can fix the boot after). Then, booted in rescue mode, vgscan --mknodes vgchange -a y mount /dev/vg0/lv_root /mnt/root mount /dev/vg0/lv_home /mnt/home mount /dev/vg0/lv_tmo /mnt/tmp .. mount -t sysfs sys /mnt/sys mount -t proc proc /mnt/proc mount -o bind /dev /mnt/dev chroot /mnt then, in run yast and try to fix the bootloader with no luck but I have check and see that the md126 and md127 were not sync at all. I have read and tried several option like booting with a rescue cd and set the boot partition to the flag grub_bios but still unable to boot. I do not event have a grub message, only a missing os error message. I will test with opensuse 12.2 for the fun Romain (In reply to comment #25) > Hi, > I experience the same issue with some differences: > 2 x 3T discs > > sda1, sdb1 => md126 /boot > sda2, sdb2 => mds127 LVM > LVM: /, /home, etc. > > During the setup at the partition step and the end it complain that there is no > /boot partition (well, yes there is one I have set md126 format ext4 and > mounted on /boot) > I have tried to forget this error and let setup finish (so I can fix the boot > after). I also experienced a similar issue, and the problem resided in the disks that were already partitioned and with raid volume already created. That situation causes confusion in yast which seems to prefer fully zeroed disks. In your case, when you started yast to install the SO, were your disks already partitioned ? The problem is still present in opensuse 13.1 rc1 (In reply to comment #27) > The problem is still present in opensuse 13.1 rc1 Which one? Bug report lists 4 problems. Which one is still present? (In reply to comment #28) > > Which one? Bug report lists 4 problems. Which one is still present? The problem number 2 is still present in 13.1 rc1. Grub2 has been installed only on the mbr of the first disk, leaving the system in an unbootable state if the first disk of the raid volume fails. (other problems were assessed and they weren't real bugs) thanks to the forum, i've just discovered this bugs: https://bugzilla.novell.com/show_bug.cgi?id=842919 which looks very connected to this one (In reply to comment #29) > > The problem number 2 is still present in 13.1 rc1. Grub2 has been installed > only on the mbr of the first disk This is a bug only of it was intended to install grub on every MBR by default. bnc#842919 may be result of such attempt. Otherwise it is the question of default. I think it is reasonable to install bootloader on all disks in RAID containing /boot by default, but I do not have enough understanding of code ATM to make a patch. The problem is still present in 13.1 final release Furthermore i discovered that systemd is NOT able to boot a system with /boot on a raid1 partition with a failing disk. It complains it cannot mount /boot and it is NOT able to launch the emergency shell (in 12.3 systemd can boot the system with a degraded raid1 without any problem) I think this is a different bug, I'm going to further investigate and to open a different bug Hi Marco, please check bug 832501 to see if it matches your "degraded RAID1 for /boot" issue. (In reply to comment #32) > Hi Marco, please check bug 832501 to see if it matches your "degraded RAID1 for > /boot" issue. Hi Neil, yes the bug you mentioned looks very similar to the issue i'm facing. I posted some information in that bug. Thank you. Do you know why the bug description is no more visible here ? (i'm the bug reporter) No. Unchecked private flag. The bug is tragically still present in 13.2 beta1 and the workaround of manually install grub2 on the second mbr does not work anymore! The boot stops because can't find the root file system (can't understand why). At this point what do you suggest ? Do i need to open a new bug ? Created attachment 609575 [details] Screenshot of bootloader settings during installation (In reply to Marco M. from comment #36) > The bug is tragically still present in 13.2 beta1 and the workaround of > manually install grub2 on the second mbr does not work anymore! The boot > stops because can't find the root file system (can't understand why). I tested installation of 13.2 RC1 with manually created /boot, swap and / on RAID1. yast correctly detected this but defaulted to installing bootloader on the first disk only (see screenshot). If you check "Enable redundancy for MD array" it installs bootloader on both disks. > > At this point what do you suggest ? Do i need to open a new bug ? I think yes. At this point the question is whether current behavior is intentional or not. This bug became rather long and confusing. Created attachment 609985 [details]
Screnshot bootloader options without the "enable redundancy " checkbox
In my case the "enable redundancy for md array" option has never appeared! (tested in 12.3, 13.1 and 13.2 beta1) I attached a Screenshot showing the bootloader options. I tried to manually install grub2 on sdb, but the system was still unable to boot with sda pulled out. The boot process stops in the initrmfs phase, before root mounting, saying it can't mount root. A very limited emergency shell comes out after a couple of minute and, from this shell, i was able to see that the raid arrays were available (degraded but available) and the lvm logical volumes for root var and home were ready, so i can't understand why it was not able to complete the boot phase. I suspect that also the bug 832501 could be still present. What do you think ? (In reply to Marco M. from comment #39) > In my case the "enable redundancy for md array" option has never appeared! > (tested in 12.3, 13.1 and 13.2 beta1) > I know. > I tried to manually install grub2 on sdb, but the system was still unable to > boot with sda pulled out. The boot process stops in the initrmfs phase, > before root mounting, saying it can't mount root. That's mdraid problem then. Can you attach the initrd for the system that fails to boot if one device is missing please. (In reply to Neil F Brown from comment #41) > Can you attach the initrd for the system that fails to boot if one device is > missing please. Hi Neil, i reopened the bug 832501 which looks strictly connected on which i'have just attached the rdosrepert.txt I'm attaching here the initrd I'm sorry, the initrd is too big, I can't attach it https://copy.com/zDQdfUM0L4lJz4fo With this link you can download my initrd file Thanks for the initrd - successfully downloaded it. It'll be a little while before I have a change to look at it and comment. Hi Neil, I have to assign this to you as this no longer a bootloader issue. Thanks. Discussion of this issue continued in bug 832501, so I'm marking this as a duplicate. *** This bug has been marked as a duplicate of bug 832501 *** |