Bug 382070

Summary: error in service module - read-only file system
Product: [openSUSE] openSUSE 11.0 Reporter: Dan Gahlinger <dgahling>
Component: BootloaderAssignee: Philipp Thomas <pth>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: cerebus_8, coolo, jplack
Version: Beta 1   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 11.0   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: second screen capture of boot process
/var/log from the test system trying to boot 11.0 beta1

Description Dan Gahlinger 2008-04-21 18:35:45 UTC
Opensuse 11.0 beta3 GM DVD version boots fine, installs up to the first reboot.

But then when it boots to continue the process it fails miserably.

I cannot seem to find a way to capture the errors.

Tons of errors "read-only filesystem" on bootup

Eventually get a login prompt (text only), cannot login!@

"error in service module"

Booting using "failsafe" mode causes the system to hang (lock up), keys unresponsive, except I can alt-f1, alt-f2 etc but all screens unresponsive to keys

since it's read only, the /var/log/messages is empty, and no other boot logs.
Ethernet interfaces don't work, and I can't do much else with it either.

here's the uname -a from an identical hardware running 10.3:

Linux LAB-DEV 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 x86_64 x86_64 GNU/Linux

booting in single user mode doesn't help either, get can't find pty a few times, mounted read only, there's no /etc/mtab and nothing in messages/etc because of the read-only mount.

CPU is
Intel(R) Pentium(R) D CPU 3.00GHz stepping 02
Comment 1 Greg Kroah-Hartman 2008-04-28 17:46:04 UTC
There is no 11.0 beta3 release, we are on beta1 right now.

What are the specific error messages you get, and what exact release are you using?
Comment 2 Dan Gahlinger 2008-04-28 18:43:52 UTC
I'm sorry, it is 11.0 beta 1, bad typo there.

I was using this http://download.opensuse.org/distribution/11.0-Beta1/iso/cd/openSUSE-11.0-Beta1-KDE4-LiveCD-i386.iso

And as far as error messages, that's the error I'm getting when I try to login when booting in "normal" mode. ie: booting normally.
I enter username "root" and the proper password and it says "error in service module" and asks for Username again. I can't login, every user that I setup has the same problem.

As mentioned in fail-safe mode, it hangs. and single user mode doesn't help either.

The file-system is read-only so no errors or logs are created.

If you can tell me how to get the boot messages I'll gladly post the boot log here.

One further note, install did NOT complete. It got through to the first reboot and that's where I'm stuck. when it rebooted to do the next part is when all the issues happened.

I've tried to reinstall again and got the same thing.
Comment 3 Dan Gahlinger 2008-04-28 18:46:07 UTC
I also tried the full GM x86-64 bit dvd, and have the same problems.
Comment 4 Dan Gahlinger 2008-04-28 19:59:05 UTC
Created attachment 210958 [details]
second screen capture of boot process

capture of boot process early-on just before all the read-only errors.
Comment 5 Greg Kroah-Hartman 2008-04-28 20:43:42 UTC
Ok, thanks, this doesn't look like a kernel issue, but a boot-time issue, reassigning...
Comment 6 Dan Gahlinger 2008-04-28 20:55:40 UTC
Sorry I didn't see that option when opening the bug.
Comment 7 Stephan Kulow 2008-04-30 19:54:48 UTC
your bug is heavily confusing. The KDE Live CD does not install in beta1, but in your initial report you talk about GM DVD. Please specify if you really installed from DVD - if so, provide your yast logs. 

Comment 8 Dan Gahlinger 2008-04-30 23:12:53 UTC
not confusing at all.

I used BOTH the GM DVD and the LIVE CD to try to get something to work.
Neither of them works.

But most important is the GM DVD.

How can I provide yast logs when the filesystem is READ ONLY ?
nothing gets written to the logs.

have you seen the screen capture I attached previously?

BTW I tried downloading a fresh image of the live cd, and burned it, it also hangs these systems.

Can we focus on the GM DVD which has the "error in service module" and file system is read only? if we can fix these problems, I'm sure the lock/hang problem of the live cd will be easier to fix.
Comment 9 Stephan Kulow 2008-05-02 08:05:42 UTC
if the installation works, you should be able to boot the rescue system from DVD and mount the target system from it to grab the yast logs
Comment 10 Dan Gahlinger 2008-05-02 15:29:19 UTC
Created attachment 212009 [details]
/var/log from the test system trying to boot 11.0 beta1

This is the entire /var/log tar/gzip of the one test system I have that has 11.0 beta 1 installed but cannot login and has some problems.
Comment 11 Stephan Kulow 2008-05-03 05:24:08 UTC
Root device:    /dev/disk/by-id/scsi-SATA_Maxtor_6V160E0_V301EPPG-part1 (/dev/sda1) (mounted on / as reiserfs)

If you boot with init=/bin/sh - do you see something strange in dmesg related to the file system?
Comment 12 Dan Gahlinger 2008-05-03 16:14:59 UTC
I won't be able to test this until monday, I'll test beta2 since it's out now, and update here once I've done that.
Comment 13 Dan Gahlinger 2008-05-05 17:18:32 UTC
Ok, this is bad. opensuse 11.0 beta 2 does EXACTLY the same thing!
This is a show stopper if I ever saw one.

I did the boot with init=/bin/sh

but nothing really out of the ordinary pops up on the file system.
It says fsck is clean, mounting read-only

The only thing that stands out is a statement about
invoking /dev/sda2 manually
and a user retry for the same statement.

I'll see if I can write it down and paste it in here later.

On the "up" side, I reinstalled using Grub/ext3 combo and that works perfectly.

So something about Lilo/reiser is really messing things up.

i'm going to write down those boot lines, then reinstall testing
grub/reiser and lilo/ext3 and see which (or both?) that causes the issue.

We need lilo (for some really weird reasons)
Comment 14 Dan Gahlinger 2008-05-05 17:48:15 UTC
Here is the boot log I see on console, keep in mind copied by hand:

...
Trying manual resume from /dev/sda2
Invoking userspace resume from /dev/sda2
resume: libgcrypt version: 1.4.0
Trying manual resume from /dev/sda2
Invoking in-kernel resume from /dev/sda2
PM: starting manual resume from /dev/sda2
Waiting for device {ID} to appear: ok
fsck 1.4.0.8
.
{no errors - normal fsck logs}
.
filesystem is clean
fsck succeeded, mounting root device read-only
mounting root {ID same as above}
...

Note: there is no file /etc/mtab and /etc/fstab looks normal
I am trying grub/reiser next and will post notes here
Comment 15 Dan Gahlinger 2008-05-05 17:59:14 UTC
This is weird.

grub/reiser works perfectly! no issues at all!

I am testing Lilo/ext3 next.

But looks like our focus should be on Lilo now as the most likely culprit!

When I change the boot loader install (during initial install) from Grub to Lilo, I always choose the option "propose changes", that has always worked in the past, even up to 10.3 works perfectly. but now, as of 11.0 it's not working.

I'll provide info on lilo/ext3 combo shortly.
Comment 16 Dan Gahlinger 2008-05-05 18:32:02 UTC
Well lilo/ext3 has the exact same problem and messages using init=/bin/sh as the above.

So Lilo is definitely the issue.

Please escalate to the Lilo team for debugging ASAP!

BTW note for about /dev/sda2 is SWAP
and the "{ID}" mentioned above is for /dev/sda1 which is the root partition.

I always build systems this way. no /home or /boot, it's a waste.
Comment 17 Stephan Kulow 2008-05-05 19:36:56 UTC
this should be fixed with beta2 - dup of 380781. Please retest

*** This bug has been marked as a duplicate of bug 380781 ***
Comment 18 Dan Gahlinger 2008-05-05 20:08:33 UTC
I *DID* test with beta2, did you not read my notes?

ALL of these problems exist with Beta 2 as well!!

All of the above tests were done under BETA 2!
Comment 19 Dan Gahlinger 2008-05-05 20:12:04 UTC
Repeat:

lilo/reiser or lilo/ext3 FAILS on opensuse 11.0 beta2
with "error in service module" and read-only filesystem.
using boot init=/bin/sh shows the above SAME problems as in beta 1

This is NOT a duplicate of 380781 - as this is not a yast issue, it is a Lilo issue
Comment 20 Stephan Kulow 2008-05-05 20:15:17 UTC
And the lilo.conf looks good?
Comment 21 Dan Gahlinger 2008-05-05 20:20:28 UTC
yes, it looks perfect. exactly like other working systems.
I have no idea what's going on.

I will try one thing though, just out of desperation and post an update here.
Comment 22 Dan Gahlinger 2008-05-05 20:43:56 UTC
no go. Although I did find a minor issue/fix with lilo on 11.0 beta 2.

in lilo.conf it puts vga=0x31s
this SHOULD be vga=0x317
instead, thats how 10.3 did it, not sure why this changed.

But this really doesn't affect the bug I'm posting.

i don't see any difference (other than that one) in the lilo.conf
Comment 23 Dan Gahlinger 2008-05-06 20:47:18 UTC
I'm out of ideas, is there anything you want me to test?

If you wish to recreate this, just boot the DVD, do a BASE install (minimal),

choose KDE4 as the desktop, ext3 or reiserfs file systems (doesn't matter),
go into the booting menu, select the boot manager tab, and choose LILO,

it will pop up a box, select "propose new changes" (or something like that).

In the boot loader screen after that, make sure you select the MBR (first option).

then install as normal and let it run.

When it finishes install and reboots, you'll see lots of "read-only file system".
and booting with init=/bin/sh does as mentioned above.

So this problem is reproduce-able, consistently so far as I can see.
Comment 24 Philipp Thomas 2008-05-08 15:13:52 UTC
I'll try to reproduce tomorrow. For the time being I'm setting this to major as this bug is definitely not critical.
Comment 25 Dan Gahlinger 2008-05-08 17:30:09 UTC
For reference in case you missed it, its x86-64 on Intel processor.
Comment 26 Dan Gahlinger 2008-05-13 02:00:12 UTC
There are only about 3 days until Beta 3, is this bug going to get fixed?

I fail to see how having a totally non-bootable system is no critical, but maybe that's just me.
Comment 27 Stephan Kulow 2008-05-13 05:33:58 UTC
we can take out the lilo option in about no time.
Comment 28 Dan Gahlinger 2008-05-13 12:01:42 UTC
LILO is a *critical* option for us, and our company.

Removing Lilo renders opensuse practically useless to us.

We have very specific needs, and I expect there are millions of other users who depend on it as well.

Lilo is not something you can just casually remove and expect users to be ok with it.

This would be a major change to the distribution, I expect you'd have to take it to committee.

I think a better option would be to fix it. What's happened from 10.3 to 11.0 that breaks in this functionality?
Comment 29 Philipp Thomas 2008-05-13 16:11:29 UTC
OK, I think I can reproduce this and am working on locating the bug.
Comment 30 Philipp Thomas 2008-05-14 16:31:07 UTC
 Dan, please check /etc/lilo.conf. My guess is, that this contains the line read-only and that would be the culprit. This is a known issue and not a LILO fault. I'll make this a duplicate 

*** This bug has been marked as a duplicate of bug 381669 ***
Comment 31 Dan Gahlinger 2008-05-14 18:03:49 UTC
yes, agree-  this is a duplicate of bug report 381669.

I see the "read-only" in lilo.conf on 10.3, which is carried over to 11.0
hopefully that other bug thread will get this mess resolved