|
Bugzilla – Full Text Bug Listing |
| Summary: | Can't boot from installed 10.2 final when using software raid (no operatings system found) | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | Franz Maier <franz.x.maier> |
| Component: | Bootloader | Assignee: | Jozef Uhliarik <juhliarik> |
| Status: | RESOLVED DUPLICATE | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Critical | ||
| Priority: | P1 - Urgent | CC: | andreas.pfaller, coolo, jplack, leo, mmarek, rccj, seuchato, stefan.fent, sysop |
| Version: | Alpha 2 | Flags: | coolo:
SHIP_STOPPER-
|
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
YaST2 logs
mdadm.conf fstab fdisk -l mdadm output serial console boot log Partition information on MD raid system used exhibiting bug Partitioning of a functioning "Pure RAID" 10.3 GM system |
||
|
Description
Franz Maier
2006-12-09 09:38:19 UTC
In the meantime I solved the problem: I partitioned like the automatic partitioning from another Pc with a hardware raid (fake raid), namely /boot swap / and /home and it works! You should at least document this special requirement, as it is quite new. Regards Franz X. Maier Why close this. The problem exists. I have just installed 10.2 with software raid (2 identical partioned SATA disks with among others / /boot /var configured as raid 1 and the system does not boot after first installation step (after the first reboot after installation). I left the bootloader configuration unchanged from the one suggested by the installer. I have booted with the rescue system and the content off all relevant partitions looks OK and the raid partitions are currently syncing. This is a NEW installation where all disks were completely repartioned. So it looks like if the only problem is the bootloader installation. Currently I am trying to figure out how to install grub manually but have not yet been successful. /etc/grub.conf contains: setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) setup --stage2=/boot/grub/stage2 (hd1,0) (hd0,0) quit and /boot/device.map: (hd0) /dev/sda (hd1) /dev/sdb Both look OK to me. Some additional info:
After messing with the recovery options of the Installation DVD of which none
worked, (e.g. the automatic repair options seem to be totally unaware of
the existence of raid and the "bootloader repair" simply failed with an error
message) I booted again with the rescue system and simply did
>grub
root (hd0,0)
setup
quit
which upon reboot at least showed the grub menu with the newly installed
system. However trying to boot "Suse 10.2" dropped me in a minimal
system (I assume the initrd) and failed to assemble the arrays.
Created attachment 109185 [details]
YaST2 logs
Obtained via rescue system from /var/log on HD of installed system.
Created attachment 109186 [details]
mdadm.conf
Obtained from /etc on HD of installed system.
The mdadm.conf included in the initrd is identical.
Created attachment 109187 [details]
fstab
Obtained from /etc on HD of installed system.
Created attachment 109188 [details]
fdisk -l
"fdisk -l" output.
Obtained while running rescue system.
Created attachment 109189 [details]
mdadm output
Obtained by running "mdadm --examine /dev/sda*"
while running rescue system.
Created attachment 109190 [details] serial console boot log Serial console output while trying to boot system after fixing the grub installation like describe in comment #3. I am running out of ideas now ;) Any hint is appreciated. Since I was running out of ideas I opted for a complete reinstall.
The differences from my previous attempt are:
- No separate /boot partition
- Only one primary partition (sd[ab]1 - not used for install).
The previous attempt had a primary partition (sd[ab]3) physically
located behind the extended partition - this worked without problem
on the same hardware with 10.1. The reason for this layout was to
ease replacement of a raid component drive if one of the drives
fails and the replacement has a slightly smaller capacity.
This new installation also did not install a working grub configuration.
However after installing grub with
> root (hd0,4)
> setup (hd0)
from the rescue system the system booted unlike my previous attempt
where the initrd failed to assemble the root fs (see boot log of
comment #9).
Thomas, isn't this a partitioner problem? The partitioner proposal certainly suggests a separate /boot partition if a fake raid disk is used. I am not sure if there is a warning if the user wants to be extra smart and removes this /boot partition or partitions manually without separate /boot. Will check this. Thomas, I think you misread my description. The system WITH a separate boot had problems which failed in the initrd. My first try had a boot partition (sda1,sdb1). / was the first extended partition (sda5,sdb5). In both my tries the partition layout was created completely manually by me with the installation system partitioner. Both attempts also failed to create a working grub configuration (configuration left unchanged from the default proposal of the installer). The attached y2log file contains no dmraid setup at all but a setup using Software Raid (/dev/md*). Sorry that I assumes it was fake raid (dmraid). I was confused by comment #1. The problems seems to be in grub.conf and bootloader setup, this is outside of yast2-storage and I reassign this to bootloader mainteiner. As the serial output looks ok (the kernel commandline is correct) I doubt
this is a bootloader problem. For some reasons, /dev/md1 seems to be broken.
/boot being a software RAID is problematic, though.
stage2 bootloader requires to be on the same physical blocks on both disks,
which can't be guaranteed with YaST partitioning the disks.
/etc/grub.conf looks broken to me, it should be:
setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
setup --stage2=/boot/grub/stage2 (hd1,0) (hd1,0)
^
I am not really familiar with grub but shouldn't the correct setup for raid-1 be: setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) Sorry, somehow Firefox sent comment #16 before I was finished. I am not really familiar with grub but shouldn't the correct setup for raid-1 be: setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) device (hd0) /dev/sdb setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0) I have not tested it yet but I think the above command will assure that sdb will be properly booted if sda completely fails or is removed as it will be the the new "0" drive for the BIOS. Different physical blocks should not really happen for stage1 because raid-1 should guarantee the they are located in identical blocks. No, as then it would only write on the first disk, sda in this case. sdb will be left untouched, so if sda fails, there is no grub at all on the new sda (old sdb). The problem here is not stage1, but stage 2 that is located in the filesystem, and stage2 stores the blocks where some part of it is located. So if you mirror it on different partitions, say sda1 and sdb3 (so there for sure are different blocks) and stage2 from sda1 gets mirrored to sdb3 this data is lost for sdb3. Means: The stage2 files must be different, if stored on different blocks. (Which will be corrected by the RAID and thus leads to an unbootable system) Stage1 is either located in the MBR, or in the boot record of the partition, both are not touched by the filesystem (except xfs) and thus not changed by the RAID. Stefan, you got me confused: I thought the "device (hd0) /dev/sdb" command essential makes grub use /dev/sdb whenever the grub "hd0" specification is used. So the 2nd setup command should install grub on /dev/sdb. At least for my current setup (see below) this works - I have verified that the system is bootable from both disks by temporary removing /dev/sda and for the next try /dev/sdb. I even zeroed the relevant disk sectors and they were restored on both disks by grub (the stage1.5 sectors, see below). Regarding the modification of stage2: Thanks for the hint Stefan, I was not aware of this problem. (Note: I meant stage2 in the last paragraph of my comment #17). Currently I have installed grub with device (hd0) /dev/sda root (hd0,4) setup (hd0) and device (hd0) /dev/sdb root (hd0,4) setup (hd0) and as far as I interpret grubs output this should be safe as this install stage1.5 in the physical sectors (1-15) (i.e. outside of any mirrored partition) and as stage1.5 understands ext2 it should have no problems finding stage2 even if the sectors occupied by stage2 change or are not identical because of different physical positions of the raid component partitions. And as I said above I have verified that it works as expected on my system. Shouldn't YaST enforce something like this automatically as soon as /boot gets installed inside an raid-1 partition? One thing however still make me nervous. Does grub's e2fs_stage1_5 understand all current ext2 features? dir_index may be a problem. A further suggestion: All documentation I found regarding the combination of raid/grub while googling is highly contradicting. It would be nice if SuSE manual would provide some details about this quite common scenario. This leaves the problem with initrd failing to assemble the root device. In hindsight I should have opened a separate bugzilla entry for this. Having spent most of the day trying to get s/w RAID installed on a system I am upgrading, I found that it is indeed a problem with putting /boot into an ext3 filesystem. If you use Reiserfs (as used to be the default) or ext2, then all is well. If you want a journaling fs just use Reiser (It has served me well for years). If you insist on ext3, create a small RAID md, say 80MB, format with ext2 and mount it as /boot. Since little ever changes on /boot, journaling doesn't buy you anything there. I tried both and they seem to boot up fine. The result from comment #20 surprises me. My first attempt had a separate /boot on ext2! While I could fix grub from the rescue system I never figured out how to make it past the initrd. With my second installation attempt boot was put on my root partition which is ext3! That attempt also created a non working grub installation (fixed again from the rescue system, final version see comment #19) and booted normally after that. One idea: Some BIOSses require to have a partition with "bootable" flag set on the bootdisk. and if that is missing they claim hat there is no operation system. Might it be, that there is/was no partition with "bootable" flag on your system ? Markus, if you look at the many details I have provided above you will see that the bootable flag was set on the boot partition (see comment #7). *** Bug 233758 has been marked as a duplicate of this bug. *** I looked into this, but could not find any quick answer. Since software-RAID issues have been postponed until after SLES10 SP1, I will work on this for 10.3 (earliest?). Moving bug to 10.3. Apparantly this *used* to work in 10.o. see: HOWTO: SUSE 10.0 and Software RAID a.k.a FakeRAID http://www.howtoforge.com/forums/showthread.php?t=1664 It appears that dmraid looks for a module 'raid45' but 10.2 & 3Ax load 'raid456' Franz, would it be possible for you to try to reproduce this bug on a recent openSUSE 10.3 beta? Created attachment 157474 [details]
Partition information on MD raid system used exhibiting bug
This problem definitely still exists in Beta1 of 10.3. In fact, it destroyed the grub config information in the IDE drive which I intentionally did not include in the installation. I configured the MD raid IAW the instructions on the openoffice.org HOWTO except I used 4 drives and chose raid 5 for the /home partition (/dev/md3) in the attachment just submitted. Not only did it not put the info in the correct area, but it ATE the info in the IDE drive which was not part of the installation (which contained a valid 10.2 install) and that install ceased to function on reboot AND the 10.3 install failed as well. It took a day to figure out what happened and physical removal of the IDE device to get the system to use the SATA drive for grub and then it used the wrong partitions no matter what I tried. Finaly, it ended up putting the boot on /dev/sdb which I still don't understand, but the system does boot now. /dev/sdb is part of a 4 drive set, and is the 2nd of a 2 drive subset that makes up /dev/md1 in the attachment which would not be where I would normally have put the mbr or the grub files as that would be the 2nd drive of a raid 1 image partition. I manually fixed the grub files and got 10.3 to boot, but I left them where the install system tried to put them. I was able to boot beta 1 10.3. I then put 10.2 back on and the 10.3beta grub config files had overwritten the 10.2 files even though they were not suppposed to even be in the loop at all. Grub has a problem when there are both IDE and SATA drives and apparantly when there are MD drives involved as well. It took a day to straighten out the 10.2 install files to get it to boot again. Now, I can boot either install, 10.2 from the IDE or 10.3 from the MDraid on 10.3 by using the BIOS boot selector. I am working to get grub on the 10.2 install to allow me to select it without having to go through bios but that is not proving to be easy for some reason, but that is another story. info provided The problem is still present in SLES10 SP1. I did some recent installs on IBM blades with 2 simple SAS disks sda1 -> md0 sdb1 -> md0 sda2 -> md1 sdb2 -> md1 My /dev/md1 contains my ext3 root filesystem, the first stage of the installation goes well (no errors) but if I reboot my system stops with 'No operating systems found'. To fix this I boot again with my SLES10 SP1 cd , go to the rescue system. Once I'm in the rescue system , I mount /dev/md1 under /mnt , chroot into /mnt , create the proper device files with mknod mknod /dev/sda b 18 0 , mknod /dev/sda2 b 18 2 , .... I start grub , and I execute the following commands: root (hd0,1) setup (hd0) root (hd1,1) setup (hd1) quit I also add an extra entry to my grub config. So 1 entry for disk0 which points to hd0 and a second entry for disk1 which is identical to the first one except for the hd1 part. As a final step I unmount /mnt , reboot and everything works! Just tried 10.3 GM and it still fails. 1G Pentium III with 2x120GB IDE drives. 1G swap on each. Rest of disks into RAID1 array with single "root" filesystem. If root is formatted with EXT3 it will fail with "No Operating System Found" after the install completes and it goes into the first reboot. Same configuration formatted with Reiserfs as the root filesystem works just fine. The bootloader can't write to /dev/md, it has to write to the devices directly. As the combination of ext3 and md destroys this information again, this doesn't work, so you have to use ext2 / reiserfs. --> invalid. (In reply to comment #35 from Stefan Fent) > The bootloader can't write to /dev/md, it has to write to the devices directly. > As the combination of ext3 and md destroys this information again, this doesn't > work, so you have to use ext2 / reiserfs. > --> invalid. > Sir, I have installed to a pure MD raid environment using exclusively EXT3 filesystems and I assure you it works and is not invalid. What is invalid is the combination of IDE and SATA when trying to do all of this. All I have to do to make it work is remove all of the IDE drives from my system and install using pure MD raid structures and EXT3. If my understanding is correct, EXT3 is EXT2 with journaling so if true, what would that have to do with the bootloader anyway? The bootloader most assuredly *can* write to dev/mdx. It is also obvious that it is capable of writing a MBR to the raw device that contains the MD raid or you wouldn't even get the 'GRUB' in the corner of the screen when you try to boot. FWIW, I am writing this from an 10.3 RAID only installation on EXT3 fs so I know it can work. What is nice is that 10.2 did it without jumping through all the hoops You can close it again, but that isn't correct, it is expedient. we'll address the root on Raid1 issues in the code11 cycle in a way that is less error prone and works on all systems, all Bioses and with all file systems. in the meantime the above setup is not supported as it works on some machines and on some no, given the test results above, so do not expect a maintenance update. Created attachment 180791 [details]
Partitioning of a functioning "Pure RAID" 10.3 GM system
For comparison, Provided to show one scheme that works perfectly *after* manual edit of GRUB files. System MBR is on SDA and is not duplicated on SDb. Swap is in RAID 0 (experimental but not required as partition could be outside of raid environment) /Boot and / (root) are both RAID 1 and /home is in RAID 5. By writing a copy of MBR to SDb, system *should* still boot if SDa drive were to fail provided BIOS supports boot device selection appropriately. Module supporting MD raid must be loaded as part of booting kernel image
No ressources assigned here right now, will address problem later What do you mean? Does this mean we will not see a fix in code11 ? reevaluate for SLE11 assign to yast2-bootloader maintainer see bug #398356 *** This bug has been marked as a duplicate of bug 398356 *** |