Bug 263116

Summary: System can't boot after auto update of kernel
Product: [openSUSE] openSUSE 10.2 Reporter: Alan Goodale <alan.goodale>
Component: KernelAssignee: Stefan Fent <stefan.fent>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: aj, angellafuente, stefan.fent
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 10.2   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Produced by the YAST-Hardware Information utility
Produced by the YAST-Post a Support Query utility
perl-Bootloader package for testimg ONLY, this is NO PTF or final product
Results of perl bootloader test 134973

Description Alan Goodale 2007-04-11 00:43:48 UTC
Installation of Open Suse 10.2 works until the kernel is updated, after which the system is unbootable.  This problem occurs 100% of the time (with every installation).  Specifically, it occurs when the kernel is updated from 2.6.18.2-34 to 2.6.18.8-0.1 by the automatic on-line update feature.  When the update occurs it creates two problems with the menu.lst file in the boot/grub directory, leaving the system unbootable .

The first problem is with the original entries in the file.  These entries remain after the update, which in my opinion is a good idea, as it is nice to be able to go back to a previous version should there be a serious problem with the new one.  unfortunately, the kernel files that are referenced by these original entries are erased.  Selecting any of them results iN a "file not found" error.

The second problem is with the new entries.  The entry for the "root" parameter contains a string which GRUB cannot interpret, resulting in a parsing error.  This makes the new entries also unusable.  The string that is inserted is "/dev/mapper/nvidia_ddcffgid_part7".  If I manually replace this string with a proper reference such as (hd0,4) before the system reboots, then the new entries are now usable and the system can be booted and used.

My suspicion is that this is being caused because of my motherboard based RAID controller.  I know that recognition of these controllers is a new feature with 10.2, and it worked well for the initial installation, but I suspect that the update process is not so versatile, resulting in the problems.

A brief description of my hardware, and contents of my "menu.lst" file can be found below.  I have two other system generated files, but I can't find a place to attach these to this bug report.  Please let me know how to send these through.

If a potential fix is available then I am prepared to test it out by reinstalling from scratch.

==========  HARDWARE  ==============================================

Motherboard		
	Vendor		Asus	
	Model		L1N64-SLI WS
	Chip Set	nVidia nForce 680a SLITM
	Serial		6CM0AG101103
	Revision	1.01G
	BIOS Revision	0205
	BIOS Utilities	Ver 2.26
		
Video Card		
	Vendor		EVGA
	Model		e-GeForce 7300 GT
	Chipset		7 Series GPU
	Driver		Compiled from download
		
CPU		
	Vendor		AMD
	Model		Athlon 64 FX-72
	Speed		2.8 Ghz
		
Memory		
	Vendor		Corsair
	Model		TWIN2X2048-6400C4
	Capacity	4 x 1024
		
HDD		
	Vendor		Hitachi
	Model		TSD-320H SY
	Capacity	2 x 320 Gig
	Configuration	SATA Raid 1

==========  MENU.LST  ==============================================

# Modified by YaST2. Last modification on Mon Apr  2 21:36:22 EDT 2007
default 5
timeout 8
gfxmenu (hd0,4)/message

###Don't change this comment - YaST2 identifier: Original name: linux###
title openSUSE 10.2
    root (hd0,4)
    kernel /vmlinuz-2.6.18.2-34-default root=/dev/mapper/nvidia_ddcffgjd_part7 vga=0x317 resume=/dev/mapper/nvidia_ddcffgjd_part6 splash=silent showopts
    initrd /initrd-2.6.18.2-34-default

###Don't change this comment - YaST2 identifier: Original name: xen###
title XEN
    root (hd0,4)
    kernel /xen.gz 
    module /vmlinuz-2.6.18.2-34-xen root=/dev/mapper/nvidia_ddcffgjd_part7 vga=0x317 resume=/dev/mapper/nvidia_ddcffgjd_part6 splash=silent showopts
    module /initrd-2.6.18.2-34-xen

###Don't change this comment - YaST2 identifier: Original name: windows###
title Windows
    rootnoverify (hd0,0)
    chainloader (hd0,0)+1

###Don't change this comment - YaST2 identifier: Original name: floppy###
title Floppy
    rootnoverify (hd0,0)
    chainloader (fd0)+1

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 10.2
    root (hd0,4)
    kernel /vmlinuz-2.6.18.2-34-default root=/dev/mapper/nvidia_ddcffgjd_part7 vga=normal showopts ide=nodma apm=off acpi=off noresume edd=off 3
    initrd /initrd-2.6.18.2-34-default

title Kernel-2.6.18.8-0.1-default
    root (hd0,4)
    kernel /vmlinuz-2.6.18.8-0.1-default root=/dev/mapper/nvidia_ddcffgjd_part7 vga=0x317 resume=/dev/mapper/nvidia_ddcffgjd_part6 splash=silent showopts
    initrd /initrd-2.6.18.8-0.1-default

title Kernel-2.6.18.8-0.1-xen
    root (hd0,4)
    kernel /xen.gz 
    module /vmlinuz-2.6.18.8-0.1-xen root=/dev/mapper/nvidia_ddcffgjd_part7 vga=0x317 resume=/dev/mapper/nvidia_ddcffgjd_part6 splash=silent showopts
    module /initrd-2.6.18.8-0.1-xen
Comment 2 Alan Goodale 2007-04-11 23:31:18 UTC
Created attachment 130591 [details]
Produced by the YAST-Hardware Information utility
Comment 3 Alan Goodale 2007-04-11 23:33:42 UTC
Created attachment 130593 [details]
Produced by the YAST-Post a Support Query utility
Comment 4 Angel Lafuente Echeazarra 2007-04-19 15:41:22 UTC
I've got the same problem in a system like this:

System: x84_64 Asus M2N32-SLI Deluxe Motherboard

RAID Controler: Nvidia nForce 590 SLI

Disks: Two SATA disks 300 GB in mirror created by RAID controller

Partitioning: typical partitioning, not LVM, due to bug 243160 on OpenSuse 10.2

When GRUB start I got three choices:

- openSuse 10.2
- Failsafe ....
- Kernel 2.6.18.8-0.1

It seems like GRUB would not be correctly configured after kernel update.

I agree all issues about mesu.lst that Alan suggested.

Maybe a mkinitrd problem with disk controller?


Comment 5 Angel Lafuente Echeazarra 2007-04-19 17:03:50 UTC
Maybe it could not a RAID controller and problem is caused by this bug?

https://bugzilla.novell.com/show_bug.cgi?id=252911

CONCLUSION = DO NOT UPDATE PRODUCTION SYSTEMS!
Comment 6 Andreas Gruenbacher 2007-04-26 12:41:46 UTC
I agree with comment #5, this does no seem to be related to whichever controller is used. It is also different from bug 252911, because the device name does not change during the update in this case (e.g., from hda to sda).

I suspect that the perl-Bootloader package which manipulates the bootloader configuration has some regression with LVM.

Alex, could you please have a look?
Comment 7 Alexander Osthof 2007-04-26 13:06:18 UTC
In reply to comment #6:

I've submitted a new perl-Bootloader package for openSUSE 10.2 yesterday. Could you please test with this new package a see if the bug still occurs with your configuration?
Comment 8 Andreas Gruenbacher 2007-04-26 13:19:55 UTC
I don't have a machine set up that has this bug, sorry. May I suggest that you prepare a package for Alan and ask him to please verify that the update works fixes this problem, as he already offered in comment 0?
Comment 9 Alexander Osthof 2007-04-26 14:14:56 UTC
Alan, could you please test with following perl-Bootloader package? Thus, update perl-Bootloader at first, then update/install the kernel.
Comment 10 Alexander Osthof 2007-04-26 14:16:42 UTC
Created attachment 134973 [details]
perl-Bootloader package for testimg ONLY, this is NO PTF or final product
Comment 11 Alan Goodale 2007-04-30 04:18:32 UTC
I've tested the updated perl-Bootloader as requested.  The results are worse than before.

1) My original problems are still all there, plus ....
2) My original menu.lst file had a comment line before each of the title lines
that began - ###Don't change this comment ....
Those comment lines are all missing from the new entries that were added to the
file.  Only the original entries still have comment lines.  Aren't these supposed to be there to enable future updates to function.
3) The new menu entries make no sense.  The prospective target audience for
SUSE Linux will be families and business people, not technicians.  They don't
care about things like "Kernel-2.6.18.8-0.2".  It must be as understandable and
straightforward as Windows - more so if possible.  Anything else will scare
people away.

My test was as follows.  I removed all Linux partitions.  I reinstalled from scratch from my 10.2 DVD.  I allowed all (oss, non-oss and debug) automatic updates to occur EXCEPT for the kernel update.  I installed the updated package (id=134973).  I double checked with "Software Management" to make certain that the updated bootloader was installed.  I then allowed the kernel update to proceed. 

I have attached an archive of the before and after versions of pertinent directory listings and menu.lst file contents for your examination.  Let me know if you need to see anything else.
Comment 12 Alan Goodale 2007-04-30 04:22:10 UTC
Created attachment 136450 [details]
Results of perl bootloader test 134973
Comment 13 Stefan Fent 2007-05-02 11:54:55 UTC
Strange, I have a very similar setup with nVidia Fake-Raid, and it works here.
Investigating further....
Comment 14 Stefan Fent 2007-05-03 11:45:46 UTC
I was now able to reproduce this with your configuration, I'll now start debugging (Still strange, as it works in 10.3, and the diff of perl-Bootloader doesn't show any obvoius differences in this area). 
Comment 15 Stefan Fent 2007-05-03 14:38:11 UTC
The package in Comment #10 is broken, I don't know why.
The package referenced at http://lists.opensuse.org/opensuse/2007-04/msg01603.html
works.
Old entries are removed upon removal of the kernel. 
The default section is adapted accordingly.
Comment 16 Alan Goodale 2007-05-03 14:51:40 UTC
Stefan, I also tested the package mentioned in Comment #15.  Please see comment #78 in bug 252911.  Sorry for the confusing start in that comment - I'm still learning how to use this system.

Regarding my 4th. point in comment #78, won't the lack of comment lines hinder future updates?
 
Comment 17 Stefan Fent 2007-05-04 06:52:13 UTC
Alan, referring to your comment #78 in Bug #252911:

We need a unique identifier for each entry, to be able to have more than one kernel installed and to be able to identify them, only the kernel-version makes sense. Any other identifier (eg date, or previous-n (for n>1)) would exactly as irritating for non-technical people, but would also make it difficult to identify the sections for experts.
The entry now looks like this:

openSUSE 10.3 - 2.6.18...

As the old kernel is still removed during update, we have to remove the old entries, as they are useless (If you download the package and install it with rpm -i, both kernels are there and the grub sections are added w/o removing the old ones)

Runlevel 3 is added again to the failsafe entry.

And the comment in the line doesn't make a problem during kernel-update, as this is only used by YaST, which isn't needed by a kernel-update.

I'll close this bug now, as the original problem (DMRAID breaks menu.lst) is fixed.