Bug 227377

Summary: Can't boot from installed 10.2 final when using software raid (no operatings system found)
Product: [openSUSE] openSUSE 11.0 Reporter: Franz Maier <franz.x.maier>
Component: BootloaderAssignee: Jozef Uhliarik <juhliarik>
Status: RESOLVED DUPLICATE QA Contact: Jiri Srain <jsrain>
Severity: Critical    
Priority: P1 - Urgent CC: andreas.pfaller, coolo, jplack, leo, mmarek, rccj, seuchato, stefan.fent, sysop
Version: Alpha 2Flags: coolo: SHIP_STOPPER-
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: YaST2 logs
mdadm.conf
fstab
fdisk -l
mdadm output
serial console boot log
Partition information on MD raid system used exhibiting bug
Partitioning of a functioning "Pure RAID" 10.3 GM system

Description Franz Maier 2006-12-09 09:38:19 UTC
Problem: when I install 10.2 final with software raid for / swap and /home (3 partitions with Raid 1 on two physical disks) the system doesnt boot after install (first boot) and the hardware tells me, that there is nor operating system present. This configuration worked ok in 10.2 RC1, but no longer works in 10.2 final. 

Regards

Franz X. Maier
Comment 1 Franz Maier 2006-12-10 10:23:47 UTC
In the meantime I solved the problem: I partitioned like the automatic partitioning from another Pc with a hardware raid (fake raid), namely /boot swap / and /home and it works! You should at least document this special requirement, as it is quite new.

Regards

Franz X. Maier  
Comment 2 Andreas Pfaller 2006-12-11 01:39:16 UTC
Why close this. The problem exists. I have just installed 10.2 with software raid (2 identical partioned SATA disks with among others / /boot /var configured as raid 1 and the system does not boot after first installation step (after the first reboot after installation). I left the bootloader configuration
unchanged from the one suggested by the installer.

I have booted with the rescue system and the content off all relevant
partitions looks OK and the raid partitions are currently syncing.
This is a NEW installation where all disks were completely
repartioned.

So it looks like if the only problem is the bootloader installation.

Currently I am trying to figure out how to install grub manually
but have not yet been successful.

/etc/grub.conf contains:

  setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
  setup --stage2=/boot/grub/stage2 (hd1,0) (hd0,0)
  quit

and /boot/device.map:

  (hd0) /dev/sda
  (hd1) /dev/sdb

Both look OK to me.
Comment 3 Andreas Pfaller 2006-12-11 03:26:34 UTC
Some additional info:

After messing with the recovery options of the Installation DVD of which none 
worked, (e.g. the automatic repair options seem to be totally unaware of
the existence of raid and the "bootloader repair" simply failed with an error
message) I booted again with the rescue system and simply did

>grub
root (hd0,0)
setup
quit

which upon reboot at least showed the grub menu with the newly installed
system. However trying to boot "Suse 10.2" dropped me in a minimal
system (I assume the initrd) and failed to assemble the arrays.
Comment 4 Andreas Pfaller 2006-12-11 17:10:11 UTC
Created attachment 109185 [details]
YaST2 logs

Obtained via rescue system from /var/log on HD of installed system.
Comment 5 Andreas Pfaller 2006-12-11 17:12:43 UTC
Created attachment 109186 [details]
mdadm.conf

Obtained from /etc on HD of installed system.
The mdadm.conf included in the initrd is identical.
Comment 6 Andreas Pfaller 2006-12-11 17:13:38 UTC
Created attachment 109187 [details]
fstab

Obtained from /etc on HD of installed system.
Comment 7 Andreas Pfaller 2006-12-11 17:15:07 UTC
Created attachment 109188 [details]
fdisk -l

"fdisk -l" output.
Obtained while running rescue system.
Comment 8 Andreas Pfaller 2006-12-11 17:16:24 UTC
Created attachment 109189 [details]
mdadm output

Obtained by running "mdadm --examine /dev/sda*"
while running rescue system.
Comment 9 Andreas Pfaller 2006-12-11 17:20:34 UTC
Created attachment 109190 [details]
serial console boot log

Serial console output while trying to boot system after
fixing the grub installation like describe in comment #3.

I am running out of ideas now ;) Any hint is appreciated.
Comment 10 Andreas Pfaller 2006-12-12 00:16:14 UTC
Since I was running out of ideas I opted for a complete reinstall.

The differences from my previous attempt are:
  - No separate /boot partition
  - Only one primary partition (sd[ab]1 - not used for install).
    The previous attempt had a primary partition (sd[ab]3) physically
    located behind the extended partition - this worked without problem
    on the same hardware with 10.1. The reason for this layout was to
    ease replacement of a raid component drive if one of the drives
    fails and the replacement has a slightly smaller capacity.

This new installation also did not install a working grub configuration.
However after installing grub with
  > root (hd0,4)
  > setup (hd0)
from the rescue system the system booted unlike my previous attempt
where the initrd failed to assemble the root fs (see boot log of
comment #9).


Comment 11 Jiří Suchomel 2006-12-18 09:03:50 UTC
Thomas, isn't this a partitioner problem?
Comment 12 Thomas Fehr 2006-12-18 09:11:08 UTC
The partitioner proposal certainly suggests a separate /boot partition if a 
fake raid disk is used. I am not sure if there is a warning if the user wants 
to be extra smart and removes this /boot partition or partitions manually 
without separate /boot. 

Will check this.
Comment 13 Andreas Pfaller 2006-12-18 14:56:25 UTC
Thomas, I think you misread my description. The system WITH a separate boot
had problems which failed in the initrd. My first try had a boot partition
(sda1,sdb1). / was the first extended partition (sda5,sdb5). In both my
tries the partition layout was created completely manually by me with
the installation system partitioner. Both attempts also failed to create
a working grub configuration (configuration left unchanged from the
default proposal of the installer).
Comment 14 Thomas Fehr 2006-12-19 08:37:57 UTC
The attached y2log file contains no dmraid setup at all but a setup using
Software Raid (/dev/md*). Sorry that I assumes it was fake raid (dmraid).
I was confused by comment #1.

The problems seems to be in grub.conf and bootloader setup, this is outside
of yast2-storage and I reassign this to bootloader mainteiner.
Comment 15 Stefan Fent 2006-12-20 08:39:04 UTC
As the serial output looks ok (the kernel commandline is correct) I doubt
this is a bootloader problem. For some reasons, /dev/md1 seems to be broken.
/boot being a software RAID is problematic, though.
stage2 bootloader requires to be on the same physical blocks on both disks,
which can't be guaranteed with YaST partitioning the disks.
/etc/grub.conf looks broken to me, it should be:

  setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
  setup --stage2=/boot/grub/stage2 (hd1,0) (hd1,0)
                                              ^ 

Comment 16 Andreas Pfaller 2006-12-20 14:03:43 UTC
I am not really familiar with grub but shouldn't the correct setup
for raid-1 be:

  setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)

  setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
Comment 17 Andreas Pfaller 2006-12-20 14:26:04 UTC
Sorry, somehow Firefox sent comment #16 before I was finished.

I am not really familiar with grub but shouldn't the correct setup
for raid-1 be:

  setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
  device (hd0) /dev/sdb
  setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)

I have not tested it yet but I think the above command will assure
that sdb will be properly booted if sda completely fails or is removed
as it will be the the new "0" drive for the BIOS. 

Different physical blocks should not really happen for stage1 because
raid-1 should guarantee the they are located in identical blocks.

Comment 18 Stefan Fent 2006-12-21 08:51:10 UTC
No, as then it would only write on the first disk, sda in this case.
sdb will be left untouched, so if sda fails, there is no grub at all on the new sda (old sdb).
The problem here is not stage1, but stage 2 that is located in the filesystem,
and stage2 stores the blocks where some part of it is located.
So if you mirror it on different partitions, say sda1 and sdb3 (so there for sure are different blocks) and stage2 from sda1 gets mirrored to sdb3 this data is lost for sdb3.
Means: The stage2 files must be different, if stored on different blocks.
(Which will be corrected by the RAID and thus leads to an unbootable system)
Stage1 is either located in the MBR, or in the boot record of the partition, both are not touched by the filesystem (except xfs) and thus not changed by the RAID.


Comment 19 Andreas Pfaller 2006-12-21 19:59:46 UTC
Stefan, you got me confused:
  I thought the "device (hd0) /dev/sdb" command essential makes
  grub use /dev/sdb whenever the grub "hd0" specification is used.
  So the 2nd setup command should install grub on /dev/sdb.
  At least for my current setup (see below) this works - I have verified
  that the system is bootable from both disks by temporary removing
  /dev/sda and for the next try /dev/sdb. I even zeroed the relevant
  disk sectors and they were restored on both disks by grub (the
  stage1.5 sectors, see below).

Regarding the modification of stage2:
  Thanks for the hint Stefan, I was not aware of this problem.
  (Note: I meant stage2 in the last paragraph of my comment #17).


Currently I have installed grub with

  device (hd0) /dev/sda
  root (hd0,4)
  setup (hd0)

and

  device (hd0) /dev/sdb
  root (hd0,4)
  setup (hd0)

and as far as I interpret grubs output this should be safe as this install
stage1.5 in the physical sectors (1-15) (i.e. outside of any mirrored partition)
and as stage1.5 understands ext2 it should have no problems finding
stage2 even if the sectors occupied by stage2 change or are not identical
because of different physical positions of the raid component partitions.
And as I said above I have verified that it works as expected on my system.
Shouldn't YaST enforce something like this automatically as soon as /boot
gets installed inside an raid-1 partition? 

One thing however still make me nervous. Does grub's e2fs_stage1_5
understand all current ext2 features? dir_index may be a problem.

A further suggestion: All documentation I found regarding the combination
of raid/grub while googling is highly contradicting. It would be nice if
SuSE manual would provide some details about this quite common scenario. 


This leaves the problem with initrd failing to assemble the root device.
In hindsight I should have opened a separate bugzilla entry for this.
Comment 20 Michael McCarthy 2006-12-28 23:57:24 UTC
Having spent most of the day trying to get s/w RAID installed on a system I am upgrading, I found that it is indeed a problem with putting /boot into an ext3 filesystem.  If you use Reiserfs (as used to be the default) or ext2, then all is well.  If you want a journaling fs just use Reiser (It has served me well for years).  If you insist on ext3, create a small RAID md, say 80MB, format with ext2 and mount it as /boot.  Since little ever changes on /boot, journaling doesn't buy you anything there.  I tried both and they seem to boot up fine.
Comment 21 Andreas Pfaller 2006-12-29 07:22:47 UTC
The result from comment #20 surprises me. My first attempt had a separate
/boot on ext2! While I could fix grub from the rescue system I never
figured out how to make it past the initrd. 

With my second installation attempt boot was put on my root partition
which is ext3! That attempt also created a non working grub installation
(fixed again from the rescue system, final version see comment #19) and
booted normally after that.
Comment 22 Markus Koßmann 2006-12-31 14:55:52 UTC
One idea: Some BIOSses require to have a partition with "bootable" flag set on the bootdisk. and if that is missing they claim hat there is no operation system. Might it be, that there is/was  no partition with "bootable" flag on your system ?  
Comment 23 Andreas Pfaller 2006-12-31 15:53:01 UTC
Markus, if you look at the many details I have provided above you will see
that the bootable flag was set on the boot partition (see comment #7).
Comment 24 Matej Horvath 2007-01-24 12:54:33 UTC
*** Bug 233758 has been marked as a duplicate of this bug. ***
Comment 25 Olaf Dabrunz 2007-02-23 12:35:34 UTC
I looked into this, but could not find any quick answer. Since software-RAID
issues have been postponed until after SLES10 SP1, I will work on this for
10.3 (earliest?).
Comment 26 Olaf Dabrunz 2007-05-10 17:39:31 UTC
Moving bug to 10.3. 
Comment 27 Richard Creighton 2007-07-03 23:39:23 UTC
Apparantly this *used* to  work in 10.o.

see:

HOWTO: SUSE 10.0 and Software RAID a.k.a FakeRAID
http://www.howtoforge.com/forums/showthread.php?t=1664

It appears that dmraid looks for a module 'raid45' but 10.2 & 3Ax load 'raid456'
Comment 28 Christoph Thiel 2007-08-14 11:55:49 UTC
Franz, would it be possible for you to try to reproduce this bug on a recent openSUSE 10.3 beta?
Comment 29 Richard Creighton 2007-08-14 16:34:46 UTC
Created attachment 157474 [details]
Partition information on MD raid system used exhibiting bug
Comment 30 Richard Creighton 2007-08-14 17:12:40 UTC
This problem definitely still exists in Beta1 of 10.3.   In fact, it destroyed the grub config information in the IDE drive which I intentionally did not include in the installation.   I configured the MD raid IAW the instructions on the openoffice.org HOWTO except I used 4 drives and chose raid 5 for the /home partition (/dev/md3) in the attachment just submitted.   Not only did it not put the info in the correct area, but it ATE the info in the IDE drive which was not part of the installation (which contained a valid 10.2 install) and that install ceased to function on reboot AND the 10.3 install failed as well.   It took a day to figure out what happened and physical removal of the IDE device to get the system to use the SATA drive for grub and then it used the wrong partitions no matter what I tried.   Finaly, it ended up putting the boot on /dev/sdb which I still don't understand, but the system does boot now.  /dev/sdb is part of a 4 drive set, and is the 2nd of a 2 drive subset that makes up /dev/md1 in the attachment which would not be where I would normally have put the mbr or the grub files as that would be the 2nd drive of a raid 1 image partition.    I manually fixed the grub files and got 10.3 to boot, but I left them where the install system tried to put them.   I was able to boot beta 1 10.3.  I then put 10.2 back on and the 10.3beta grub config files had overwritten the 10.2 files even though they were not suppposed to even be in the loop at all.   Grub has a problem when there are both IDE and SATA drives and apparantly when there are MD drives involved as well.   It took a day to straighten out the 10.2 install files to get it to boot again.   Now, I can boot either install, 10.2 from the IDE or 10.3 from the MDraid on 10.3 by using the BIOS boot selector.   I am working to get grub on the 10.2 install to allow me to select it without having to go through bios but that is not proving to be easy for some reason, but that is another story.
Comment 31 Stephan Kulow 2007-09-08 06:08:26 UTC
info provided
Comment 32 Leonce Eraly 2007-09-25 07:32:53 UTC
The problem is still present in SLES10 SP1.

I did some recent installs on IBM blades with 2 simple SAS disks

sda1 -> md0
sdb1 -> md0

sda2 -> md1
sdb2 -> md1

My /dev/md1 contains my ext3 root filesystem, the first stage of the installation goes well (no errors) but if I reboot my system stops with 'No operating systems found'.

To fix this I boot again with my SLES10 SP1 cd , go to the rescue system.

Once I'm in the rescue system , I mount /dev/md1 under /mnt , chroot into /mnt , 
create the proper device files with mknod
mknod /dev/sda b 18 0 , mknod /dev/sda2 b 18 2 , ....

I start grub , and I execute the following commands:

root (hd0,1)

setup (hd0)

root (hd1,1)
setup (hd1)

quit

I also add an extra entry to my grub config. So 1 entry for disk0 which points to hd0 and a second entry for disk1 which is identical to the first one except for the hd1 part.

As a final step I unmount /mnt , reboot and everything works!

Comment 33 Michael McCarthy 2007-10-08 22:43:42 UTC
Just tried 10.3 GM and it still fails.

1G Pentium III with 2x120GB IDE drives.
1G swap on each.
Rest of disks into RAID1 array with single "root" filesystem.
If root is formatted with EXT3 it will fail with "No Operating System Found" after the install completes and it goes into the first reboot.

Same configuration formatted with Reiserfs as the root filesystem works just fine.
Comment 35 Stefan Fent 2007-10-10 11:10:03 UTC
The bootloader can't write to /dev/md, it has to write to the devices directly.
As the combination of ext3 and md destroys this information again, this doesn't work, so you have to use ext2 / reiserfs. 
--> invalid.

Comment 36 Richard Creighton 2007-10-10 13:06:08 UTC
(In reply to comment #35 from Stefan Fent)
> The bootloader can't write to /dev/md, it has to write to the devices directly.
> As the combination of ext3 and md destroys this information again, this doesn't
> work, so you have to use ext2 / reiserfs. 
> --> invalid.
> 

Sir, I have installed to a pure MD raid environment using exclusively EXT3 filesystems and I assure you it works and is not invalid.   What is invalid is the combination of IDE and SATA when trying to do all of this.   All I have to do to make it work is remove all of the IDE drives from my system and install using pure MD raid structures and EXT3.  If my understanding is correct, EXT3 is EXT2 with journaling so if true, what would that have to do with the bootloader anyway?  The bootloader most assuredly *can* write to dev/mdx.  It is also obvious that it is capable of writing a MBR to the raw device that contains the MD raid or you wouldn't even get the 'GRUB' in the corner of the screen when you try to boot. FWIW, I am writing this from an 10.3 RAID only installation on EXT3 fs so I know it can work.  What is nice is that 10.2 did it without jumping through all the hoops   You can close it again, but that isn't correct, it is expedient.
Comment 38 Joachim Plack 2007-10-26 11:22:06 UTC
we'll address the root on Raid1 issues in the code11 cycle in a way that is less error prone and works on all systems, all Bioses and with all file systems.

in the meantime the above setup is not supported as it works on some machines and
on some no, given the test results above, so do not expect a maintenance update.
Comment 39 Richard Creighton 2007-10-26 12:36:58 UTC
Created attachment 180791 [details]
Partitioning of a functioning "Pure RAID" 10.3 GM system

For comparison, Provided to show one scheme that works perfectly *after* manual edit of GRUB files.  System MBR is on SDA and is not duplicated on SDb.  Swap is in RAID 0 (experimental but not required as partition could be outside of raid environment)  /Boot and / (root) are both RAID 1 and /home is in RAID 5.  By writing a copy of MBR to SDb, system *should* still boot if SDa drive were to fail provided BIOS supports boot device selection appropriately.  Module supporting MD raid must be loaded as part of booting kernel image
Comment 40 Joachim Plack 2007-12-12 20:03:54 UTC
No ressources assigned here right now, will address problem later
Comment 41 Leonce Eraly 2007-12-12 21:27:14 UTC
What do you mean? Does this mean we will not see a fix in code11 ?
Comment 42 Joachim Plack 2008-03-20 04:13:49 UTC
reevaluate for SLE11
Comment 43 Joachim Plack 2008-07-15 11:21:44 UTC
assign to yast2-bootloader maintainer
Comment 44 Jozef Uhliarik 2008-07-15 17:47:51 UTC
see bug #398356

*** This bug has been marked as a duplicate of bug 398356 ***