Bug 231205 - Freeze very early in boot process with SMP
Summary: Freeze very early in boot process with SMP
Status: RESOLVED DUPLICATE of bug 232013
: 229217 231056 (view as bug list)
Alias: None
Product: openSUSE 10.2
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: i686 SUSE Other
: P5 - None : Major with 15 votes (vote)
Target Milestone: ---
Assignee: Thomas Renninger
QA Contact: E-mail List
URL:
Whiteboard:
Keywords: SMP
Depends on:
Blocks: 227279
  Show dependency treegraph
 
Reported: 2007-01-01 00:03 UTC by Paul Mogren
Modified: 2007-03-05 18:06 UTC (History)
6 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hwinfo output (277.99 KB, text/plain)
2007-01-01 00:04 UTC, Paul Mogren
Details
dmidump output (10.87 KB, text/plain)
2007-01-20 01:36 UTC, Paul Mogren
Details
dmidump output (8.69 KB, text/plain)
2007-01-20 10:58 UTC, Kai Krakow
Details
hwinfo output (277.65 KB, text/plain)
2007-01-20 11:05 UTC, Kai Krakow
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Mogren 2007-01-01 00:03:53 UTC
This system can run SUSE 10.1's SMP kernel, but both the 10.2 installer and the installed 10.2 system hang very quickly at boot unless SMP is disabled using kernel parameter nosmp or maxcpus. The last line printed on the installed system's console before it hangs is "NET: Registered protocol family 2".  When SMP is disabled, everything seems fine, and the next line is "IP route cache hash table entries: 32768 (order: 5, 131072 bytes)".

I have tried the kernel of the day (2.6.18.5-SL102_BRANCH_20061223002647-default), and I have tried a number of other kernel parameters including acpi=off, apm=off, ide=nodma, pci=routeirq, edd=off, noapic, nolapic, init=/bin/sh... I have not found anything that lets the system boot without disabling SMP.

There is no OOPS or PANIC or printed on screen or in any logs that I can find. In fact I'm pretty sure it freezes before anything is written to disk at all. I tried hooking up netconsole and got nothing at all, but I can't say for sure that it was configured correctly. I don't have the equipment to set up a serial console, but if that's necessary I could probably track it down.
Comment 1 Paul Mogren 2007-01-01 00:04:48 UTC
Created attachment 111231 [details]
hwinfo output
Comment 2 Paul Mogren 2007-01-01 21:30:52 UTC
After continued googling, I got a hunch that the "stack unwinder" could be the source of my troubles. Linus removed it altogether for 2.6.20-rc2. I just built a vanilla 2.6.20-rc3 kernel, and sure enough, it boots correctly (with nolapic). I will try to narrow down which kernel versions work and which don't. If my hunch is correct, that'll be an easy task.

Comment 3 Paul Mogren 2007-01-02 04:09:02 UTC
To take my build process out of the equation, I verified that a custom build of the default SUSE kernel does in fact exhibit the freeze.

Vanilla 2.6.19 boots. Maybe the unwinder fixes therein are enough to take care of it, or maybe it's something else entirely. I'll have more information tomorrow.
Comment 4 Paul Mogren 2007-01-02 19:51:41 UTC
I built the default SUSE kernel (2.6.18.2-34) again, this time with CONFIG_UNWIND_INFO disabled, and it still froze up. I guess the unwinder is not the problem.
Comment 5 Paul Mogren 2007-01-03 01:15:51 UTC
Vanilla 2.6.18.2 boots. Looks like one of the patches in -34 must be the culprit.
Comment 6 Paul Mogren 2007-01-03 01:28:23 UTC
Yikes, there are many patches, can somebody give me a clue as to which ones might be suspect?
Comment 7 Lars Marowsky-Bree 2007-01-05 16:20:05 UTC
Say, does booting with maxcpus=0 work for you?
Comment 8 Paul Mogren 2007-01-05 18:29:03 UTC
Yes, that's effectively the same as nosmp, isn't it? With either, I can boot and have only one CPU enabled.
Comment 9 Eymen Alyaz 2007-01-10 13:56:48 UTC
does booting kernel-bigsmp work for you?
Comment 10 Joseph Comfort 2007-01-10 19:49:48 UTC
(In reply to comment #9)
I have nearly identical problems with an x86_64 installation.  Curiously, a i586 dvd iso does not cause hangs or freezes.  There is no kernel-bigsmp for x86_64 that I can see.  There is a lot more information available for bug 232013.  

Comment 11 Ludek Dolejsky 2007-01-11 21:49:15 UTC
I got the same problem. Running Suse 10.2 on Lenovo 3000 n100 dual core and
system hangs while booting. When I pass "nosmp" to kernel, it works fine (using
just one core). Tried to use Vanilla kernel and there were no problems (system
used both cores). Would be great if this get fixed...
Comment 12 Greg Kroah-Hartman 2007-01-12 04:33:12 UTC
Paul, can you get a kernel log oops message when the machine hangs?
Comment 13 Joseph Comfort 2007-01-12 04:48:00 UTC
(In reply to comment #11)
I have been watching this bug and also bug 232013.  Following a suggestion in
232013, I downloaded and installed the latest kotd.  All problems are resolved.
Whatever got fixed needs to be backported.  New install iso's also need to be
posted on the download sites.  For comment #12, there are a bunch of logs in bug 232103.

Comment 14 Greg Kroah-Hartman 2007-01-12 04:55:54 UTC
Thanks for letting us know that the KOTD fixes this issue.

But merely backporting will not work, as that kernel is based on 2.6.20-rc4 or so.  And the 10.2 kernel is 2.6.18 based.  _lots_ of things have changed inbetween these releases :)

If the KOTD works for you, I'd recommend just using that.
Comment 15 Kay Sievers 2007-01-12 07:06:21 UTC
*** Bug 231056 has been marked as a duplicate of this bug. ***
Comment 16 Paul Mogren 2007-01-12 16:28:36 UTC
Glad to see some activity here; I've been doing some unexpected traveling. The KOTD did not work for me when I first posted this, but this weekend I will try again with the latest KOTD, and I will also try bigsmp. 

Greg, I think the only way I might possibly find an OOPS message is to set up a serial console, but it hangs so early that I'm even not sure whether that will work. If neither the newer KOTD or the bigsmp kernels work for me, I guess I'll start trying to track down a suitable serial cable.
Comment 17 Paul Mogren 2007-01-13 15:49:33 UTC
Neither kernel-default-2.6.18.5-SL102_BRANCH_20070111163922.i586.rpm nor kernel-bigsmp-2.6.18.5-SL102_BRANCH_20070111163922.i586.rpm work for me. It sounds like those who have had success with KOTD were using the head branch? I don't see any binaries there so I can't try it real quick.
Comment 18 Paul Mogren 2007-01-13 22:11:41 UTC
I haven't been able to track down a serial cable. I'd have to order one online, and I'm not sure it's worth the cost to me, as I've promised myself I would not put any more money into this machine. I'm clearing the NEEDINFO. If you decide that's the only hope of figuring this out, you can set it back... but I don't think we are there yet.
Comment 19 Paul Mogren 2007-01-13 22:13:11 UTC
Can anyone else that can reproduce this bug get an OOPS for us?
Comment 20 Paul Mogren 2007-01-14 16:34:35 UTC
From the HEAD branch, kernel-default-2.6.20_rc5-20070113193557.i586.rpm works for me.

I'm still willing to test any idea that'll help get the 10.2 branch fixed.
Comment 21 Ludek Dolejsky 2007-01-14 17:12:56 UTC
Yes,

I confirm that kernel-default-2.6.20_rc5-20070113193557.i586.rpm from kotd works for me too.
Comment 22 Paul Mogren 2007-01-15 15:24:28 UTC
Just to let you know, during the evenings this week, I will be systematically excluding patches from 2.6.18.2 and rebuilding, to see if I can identify which causes the hang.
Comment 23 Paul Mogren 2007-01-17 03:35:20 UTC
Here's something I found surprising:
Even with only one CPU physically in the system, it still hangs.
Comment 24 Paul Mogren 2007-01-17 13:40:52 UTC
Alright folks, I'm quite confident (but have not proven exhaustively... maybe next week) that for me, the hang is caused by the patch called patches.arch/i386-apic-auto IN CONJUNCTION WITH use of the kernel parameter nolapic. I am posting this while running the distributed 2.6.18.2-34 binary  with kernel parameter noapic but NOT nolapic and NOT nosmp/maxcpus=0/maxcpus=1. At this time, I still have only one processor physically plugged in.
Comment 25 Thomas Renninger 2007-01-18 16:17:32 UTC
Can you post dmidecode output of your machine pls.
Comment 26 Kai Krakow 2007-01-18 23:58:03 UTC
I can confirm this bug. It freezes at a line "NET: Registered protocol family 2" with no more info printed. My kernel append line is:

root=/dev/hda8 vga=0x317 splash=silent resume=/dev/hda6 acpismp=force apm=power-off showopts

Appending "nosmp" makes the system boot for now, so I leave it until this bug is fixed.
Comment 27 Kai Krakow 2007-01-19 00:03:21 UTC
BTW: Which package contains dmidecode? I'd like to contribute to this bug...
Comment 28 Thomas Renninger 2007-01-19 11:02:28 UTC
It's in pmtools and should be in default installation.
It needs to be run as root.
Comment 29 Paul Mogren 2007-01-20 01:36:46 UTC
Created attachment 114008 [details]
dmidump output
Comment 30 Kai Krakow 2007-01-20 10:56:01 UTC
I have no default installation - it's pretty minimalistic and I use only smart package manager to upgrade. ;-)

Well, first point: I can confirm that kernel-bigsmp boots without freezing and also shows me both cpu's in /proc/cpuinfo. So that works for now.

I'll attach my dmidump soon...
Comment 31 Kai Krakow 2007-01-20 10:58:43 UTC
Created attachment 114025 [details]
dmidump output
Comment 32 Kai Krakow 2007-01-20 11:05:11 UTC
Created attachment 114028 [details]
hwinfo output

I also attach my hwinfo dump for completeness reasons...
Comment 33 Thomas Renninger 2007-02-28 10:04:39 UTC
If this is a Pentium M:
Does the boot parameter max_cstate=1 help?
If yes, this should be fixed in next update kernel, but I can point you to kernel to test and verify before it's coming out.
Comment 34 Kai Krakow 2007-02-28 11:46:25 UTC
(In reply to comment #33)
> If this is a Pentium M:
> Does the boot parameter max_cstate=1 help?
> If yes, this should be fixed in next update kernel, but I can point you to
> kernel to test and verify before it's coming out.

For me it is a Asus P2B based dual Pentium-2 board...

Comment 35 Thomas Renninger 2007-02-28 13:33:26 UTC
*** Bug 229217 has been marked as a duplicate of this bug. ***
Comment 36 Paul Mogren 2007-03-01 02:10:38 UTC
For me it is a dual Pentium-III (Abit VP6)
Comment 37 Thomas Renninger 2007-03-05 18:06:05 UTC

*** This bug has been marked as a duplicate of bug 232013 ***