|
Bugzilla – Full Text Bug Listing |
| Summary: | Booting from Software-RAID fails | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | Steffen Moser <mail> |
| Component: | Bootloader | Assignee: | Josef Reidinger <jreidinger> |
| Status: | RESOLVED FEATURE | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Blocker | ||
| Priority: | P2 - High | CC: | andreas.pfaller, aschnell, bsouthey, bugz57, franz.x.maier, mail, rhg, rombert |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 11.0 | ||
| Whiteboard: | |||
| Found By: | Third Party Developer/Partner | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | Log files from YaST2 ("/var/log/YaST2") after installation of openSUSE-11.0-RC1 | ||
|
Description
Steffen Moser
2008-06-08 15:25:29 UTC
yast2-storage gives a warning for "/" on raid and no "/boot" partition but only for raid-type != 1. Changing that would be very easy. The other things are out of my reach. It seems that my initial statement regarding GRUB is wrong. I've just switched from LILO to GRUB and the system is still booting without any problems. In the meantime, I suppose that my initial problem is only related to the RAID superblock version. I will do a further test: I am going to make "/boot" to version 1.x (instead of 0.9) and look what GRUB says to it. In bug #343851 are some info about LILO and superblock version 1.0. I hope that I can clarify it a bit:
- LILO has definitive the problem that it still cannot boot from MD devices that have the superblock version 1.0 (at least the version of LILO that comes with 11.0-RC1). MD devices created during the installation do have version 1.0, so they lead to an unbootable system. I think that in this case, we can warn the user and/or use superblock version 0.90 in this case for the boot device.
- Despite to my initial posting, GRUB in general is able to boot from MD devices, no matter which superblock version they consist of.
When installing openSUSE-11.0-RC1 on a system partitioned as follows:
/dev/sda1 \
---- /dev/md0 -> SWAP
/dev/sdb1 /
/dev/sda2 \
---- /dev/md1 -> /boot
/dev/sdb2 /
/dev/sda3 \
---- /dev/md2 -> /
/dev/sdb3 /
the system may be unbootable.
The problem is that in this case, GRUB gets only installed to the MBR of "/dev/sdb", but not to the MBR of "/dev/sda". Now it depends on the system BIOS, which disk is selected for booting. In my test environment, the BIOS only tried to boot from "/dev/sda" which had an empty MBR, as it was a completely fresh install. After telling the BIOS to boot from "/dev/sdb", the system came up cleanly. Installing GRUB also to "/dev/sda" manually made the system bootable from both disks.
So I think, that when using GRUB, YaST2 (or "grub-install") should take care of installing the boot code to the MBRs of all disks which are part of the MD device that is used for booting.
I have only tested RAID level 1.
Could you, please, attach the logs? My guess is that BIOS ID detection failed, which caused the error. However, without the logs I can only guess. Attachted you'll find an archive of "/var/log/YaST2". The BIOS I use comes from "VMware Workstation" as I did this beta test within VMware. "(hd0)" seems to get mapped to "/dev/sdb" and "(hd1)" seems to get mapped to "/dev/sda". Nevertheless, when doing a | grep stage2 y2log I find the following matches: | 'setup --stage2=/boot/grub/stage2 (hd0,1) (hd1,1)' | ' setup --stage2=/boot/grub/stage2 (hd1,1) (hd1,1)' | ' setup --stage2=/boot/grub/stage2 (hd0) (hd1,1)' | 'setup --stage2=/boot/grub/stage2 (hd1,1) (hd1,1)' | ' setup --stage2=/boot/grub/stage2 (hd0) (hd1,1)' There is no line that indicates the installation of the GRUB code to the MBR of "(hd1)". This confirms my observation: "/dev/sdb" (which is "(hd0)") is bootable, but "/dev/sda" (which is "(hd1)") isn't. If you need more information or other logs, feel free to ask for. Created attachment 221106 [details]
Log files from YaST2 ("/var/log/YaST2") after installation of openSUSE-11.0-RC1
I think I can contribute. I get the same problem while trying a network installation.
My partition scheme is (this is a VMware test installation before installing the real thing [hosted root server in a remote data center]):
Disk /dev/sda: 5368 MB, 5368709120 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 9 72261 fd Linux raid autodetect
/dev/sda2 10 42 265072+ fd Linux raid autodetect
/dev/sda3 43 652 4899825 fd Linux raid autodetect
Disk /dev/sdb: 5368 MB, 5368709120 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 9 72261 fd Linux raid autodetect
/dev/sdb2 10 42 265072+ fd Linux raid autodetect
/dev/sdb3 43 652 4899825 fd Linux raid autodetect
Raid arrays:
/dev/md0 mounted as /boot consisting of /dev/sda1 and /dev/sdb1
/dev/md1 mounted as swap consisting of /dev/sda2 and /dev/sdb2
/dev/md2 mounted as / consisting of /dev/sda3 and /dev/sdb3
During installation the error message below appears on the shell where I did the ssh login and started yast:
--- snip ---
*** Starting YaST2 ***
error: cannot open Packages index using db3 - No such file or directory (2)
Use of uninitialized value $extended_dev in string eq at
/usr/lib/perl5/vendor_perl/5.10.0/Bootloader/Core/GRUB.pm line 852 (#1)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you the
name of the variable (if any) that was undefined. In some cases it cannot
do this, so it also tells you what operation you used the undefined value
in. Note, however, that perl optimizes your program and the operation
displayed in the warning may not necessarily appear literally in your
program. For example, "that $foo" is usually optimized into "that "
. $foo, and the warning will refer to the concatenation (.) operator,
even though there is no . in your program.
--- snap ---
The first reboot then fails with an "invalid boot sector" error message.
With the help of the Rescue System I found that /etc/grub.conf was setup incorrectly:
setup --stage2=/boot/grub/stage2 (hd1,0) (hd0,0)
quit
This should have been:
setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
setup --stage2=/boot/grub/stage2 (hd1,0) (hd0,0)
quit
The problem can be fixed by correcting /etc/grub.conf and then invoking grub-install. After that, the system successfully boots and the installation can continue with /usr/lib/YaST2/startup/YaST2.ssh.
Good that I tried this in a test installation before using this on the remote server :-)
Happens with the final openSUSE 11.0. Trying to fix this during installation does not work, at least I did not find a workaround yet. What I tried: Created /tmp/YaST2_keep_sshd_running so that I get a chance to fix this before the first reboot. However, the steps that fixed the problem in the Rescue System did not work. Here is what I did: mount /dev/md2 /mnt mount /dev/md0 /mnt/boot mount -t proc /proc /mnt/proc mount -t sysfs /sys /mnt/sys mknod /mnt/dev/sda b 8 0 mknod /mnt/dev/sda1 b 8 1 mknod /mnt/dev/sda2 b 8 2 mknod /mnt/dev/sda3 b 8 3 mknod /mnt/dev/sdb b 8 16 mknod /mnt/dev/sdb1 b 8 17 mknod /mnt/dev/sdb2 b 8 18 mknod /mnt/dev/sdb3 b 8 19 chroot /mnt cat >/etc/grub.conf <<eof. setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) setup --stage2=/boot/grub/stage2 (hd1,0) (hd0,0) quit eof. grub-install Here is what grub-install outputs: grub> setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0,0) /boot/grub/stage2 p /boot/grub/menu.lst "... succeeded Done. grub> setup --stage2=/boot/grub/stage2 (hd1,0) (hd0,0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd1,0)"... failed (this is not fatal) Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 d (hd1,0) /boot/grub/stage2 p /boot/grub/menu.lst "... succeeded Done. Result: On reboot I get a "invalid partition table" BIOS message. When I try the same in the Rescue System it works. The grub-install messages are identical and afterwards the system boots correctly. Unfortunately, using the Rescue System on the remote server is not an option, therefore please allow me to raise the priority and set the severity to "blocker". to commnent #6: - I can write that hd0 is /dev/sdb and hd1 is /dev/sda (from logs comment #7) - there is missing "write to MBR of the second disk (hd1)" (perl-Bootloader) - yast2-bootloader use wrong device (/dev/md) for analyse MBR to comment #4: - your raid 1 is version:"01.00.00". The version 1.0... is not supported by LILO. - yast2-bootloader should check version of raid for LILO and it should write error message: "Because of the partitioning, the boot loader cannot be installed properly." (there are missing logs from installation where was selected LILO) bnc #357897 Result: - the order of disks seems to be wrong (hwinfo) - the perl-Bootloader writes GRUB only to one disk (MBR) - yast2-bootloader use wrong disk for analyse MBR The perl-Bootloader is responsible for creating /etc/grub.conf and support to write GRUB to both disk also in MBR (RAID1) yast2-bootloader needs update of analyse MBR. It should be done soon. yast2-bootloader 2.17.1 includes support for analyse correct MBR resp. it is selected correct device if soft-raid is used. It's unfortunate that 11.0 final is shipping with yast2-bootloader-2.16.20. This means yast doens't manage to install a boot loader and half way through the installation the reboot stops dead. Nice GAU, but it's getting worse. For 10.3 yast was at least able to install a functional boot loader *if* one selected the right options, which is no longer possible. This was already reported for beta3 (#391971) and something similar for 10.3 final (#341309). Why are things like this shipped? I think that I may have also met this bug with using opensuse-11 KDE live CD x86_64. I was eventually able to get around it by formating my / partition (includes /boot) and explicitly setting the inode size to 128 rather than leaving it as default - this was from a post by Tamas Sarga (http://www.nabble.com/inode-size-256-cause-a-pain-td17290645.html). Bruce (comment #15), this bug has nothing to do with filesystem or ext2 inode size (btw I'm using reiser). Yast/perl-bootloader are simply creating the wrong grub commands for installing the boot loader. Reassigning to new maintainer of perl-Bootloader. *** Bug 227377 has been marked as a duplicate of this bug. *** Part with badly selected disk is fixed by juhliarik. Part which want recognize if MD array cover disks with same partition table and write same MBR to it is cover by feature request. |