Bugzilla – Bug 231205
Freeze very early in boot process with SMP
Last modified: 2007-03-05 18:06:05 UTC
This system can run SUSE 10.1's SMP kernel, but both the 10.2 installer and the installed 10.2 system hang very quickly at boot unless SMP is disabled using kernel parameter nosmp or maxcpus. The last line printed on the installed system's console before it hangs is "NET: Registered protocol family 2". When SMP is disabled, everything seems fine, and the next line is "IP route cache hash table entries: 32768 (order: 5, 131072 bytes)". I have tried the kernel of the day (2.6.18.5-SL102_BRANCH_20061223002647-default), and I have tried a number of other kernel parameters including acpi=off, apm=off, ide=nodma, pci=routeirq, edd=off, noapic, nolapic, init=/bin/sh... I have not found anything that lets the system boot without disabling SMP. There is no OOPS or PANIC or printed on screen or in any logs that I can find. In fact I'm pretty sure it freezes before anything is written to disk at all. I tried hooking up netconsole and got nothing at all, but I can't say for sure that it was configured correctly. I don't have the equipment to set up a serial console, but if that's necessary I could probably track it down.
Created attachment 111231 [details] hwinfo output
After continued googling, I got a hunch that the "stack unwinder" could be the source of my troubles. Linus removed it altogether for 2.6.20-rc2. I just built a vanilla 2.6.20-rc3 kernel, and sure enough, it boots correctly (with nolapic). I will try to narrow down which kernel versions work and which don't. If my hunch is correct, that'll be an easy task.
To take my build process out of the equation, I verified that a custom build of the default SUSE kernel does in fact exhibit the freeze. Vanilla 2.6.19 boots. Maybe the unwinder fixes therein are enough to take care of it, or maybe it's something else entirely. I'll have more information tomorrow.
I built the default SUSE kernel (2.6.18.2-34) again, this time with CONFIG_UNWIND_INFO disabled, and it still froze up. I guess the unwinder is not the problem.
Vanilla 2.6.18.2 boots. Looks like one of the patches in -34 must be the culprit.
Yikes, there are many patches, can somebody give me a clue as to which ones might be suspect?
Say, does booting with maxcpus=0 work for you?
Yes, that's effectively the same as nosmp, isn't it? With either, I can boot and have only one CPU enabled.
does booting kernel-bigsmp work for you?
(In reply to comment #9) I have nearly identical problems with an x86_64 installation. Curiously, a i586 dvd iso does not cause hangs or freezes. There is no kernel-bigsmp for x86_64 that I can see. There is a lot more information available for bug 232013.
I got the same problem. Running Suse 10.2 on Lenovo 3000 n100 dual core and system hangs while booting. When I pass "nosmp" to kernel, it works fine (using just one core). Tried to use Vanilla kernel and there were no problems (system used both cores). Would be great if this get fixed...
Paul, can you get a kernel log oops message when the machine hangs?
(In reply to comment #11) I have been watching this bug and also bug 232013. Following a suggestion in 232013, I downloaded and installed the latest kotd. All problems are resolved. Whatever got fixed needs to be backported. New install iso's also need to be posted on the download sites. For comment #12, there are a bunch of logs in bug 232103.
Thanks for letting us know that the KOTD fixes this issue. But merely backporting will not work, as that kernel is based on 2.6.20-rc4 or so. And the 10.2 kernel is 2.6.18 based. _lots_ of things have changed inbetween these releases :) If the KOTD works for you, I'd recommend just using that.
*** Bug 231056 has been marked as a duplicate of this bug. ***
Glad to see some activity here; I've been doing some unexpected traveling. The KOTD did not work for me when I first posted this, but this weekend I will try again with the latest KOTD, and I will also try bigsmp. Greg, I think the only way I might possibly find an OOPS message is to set up a serial console, but it hangs so early that I'm even not sure whether that will work. If neither the newer KOTD or the bigsmp kernels work for me, I guess I'll start trying to track down a suitable serial cable.
Neither kernel-default-2.6.18.5-SL102_BRANCH_20070111163922.i586.rpm nor kernel-bigsmp-2.6.18.5-SL102_BRANCH_20070111163922.i586.rpm work for me. It sounds like those who have had success with KOTD were using the head branch? I don't see any binaries there so I can't try it real quick.
I haven't been able to track down a serial cable. I'd have to order one online, and I'm not sure it's worth the cost to me, as I've promised myself I would not put any more money into this machine. I'm clearing the NEEDINFO. If you decide that's the only hope of figuring this out, you can set it back... but I don't think we are there yet.
Can anyone else that can reproduce this bug get an OOPS for us?
From the HEAD branch, kernel-default-2.6.20_rc5-20070113193557.i586.rpm works for me. I'm still willing to test any idea that'll help get the 10.2 branch fixed.
Yes, I confirm that kernel-default-2.6.20_rc5-20070113193557.i586.rpm from kotd works for me too.
Just to let you know, during the evenings this week, I will be systematically excluding patches from 2.6.18.2 and rebuilding, to see if I can identify which causes the hang.
Here's something I found surprising: Even with only one CPU physically in the system, it still hangs.
Alright folks, I'm quite confident (but have not proven exhaustively... maybe next week) that for me, the hang is caused by the patch called patches.arch/i386-apic-auto IN CONJUNCTION WITH use of the kernel parameter nolapic. I am posting this while running the distributed 2.6.18.2-34 binary with kernel parameter noapic but NOT nolapic and NOT nosmp/maxcpus=0/maxcpus=1. At this time, I still have only one processor physically plugged in.
Can you post dmidecode output of your machine pls.
I can confirm this bug. It freezes at a line "NET: Registered protocol family 2" with no more info printed. My kernel append line is: root=/dev/hda8 vga=0x317 splash=silent resume=/dev/hda6 acpismp=force apm=power-off showopts Appending "nosmp" makes the system boot for now, so I leave it until this bug is fixed.
BTW: Which package contains dmidecode? I'd like to contribute to this bug...
It's in pmtools and should be in default installation. It needs to be run as root.
Created attachment 114008 [details] dmidump output
I have no default installation - it's pretty minimalistic and I use only smart package manager to upgrade. ;-) Well, first point: I can confirm that kernel-bigsmp boots without freezing and also shows me both cpu's in /proc/cpuinfo. So that works for now. I'll attach my dmidump soon...
Created attachment 114025 [details] dmidump output
Created attachment 114028 [details] hwinfo output I also attach my hwinfo dump for completeness reasons...
If this is a Pentium M: Does the boot parameter max_cstate=1 help? If yes, this should be fixed in next update kernel, but I can point you to kernel to test and verify before it's coming out.
(In reply to comment #33) > If this is a Pentium M: > Does the boot parameter max_cstate=1 help? > If yes, this should be fixed in next update kernel, but I can point you to > kernel to test and verify before it's coming out. For me it is a Asus P2B based dual Pentium-2 board...
*** Bug 229217 has been marked as a duplicate of this bug. ***
For me it is a dual Pentium-III (Abit VP6)
*** This bug has been marked as a duplicate of bug 232013 ***