Bug 660464 - complete system freeze regression
Summary: complete system freeze regression
Status: RESOLVED FIXED
Alias: None
Product: openSUSE 11.4
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Factory
Hardware: x86 Linux
: P2 - High : Critical with 5 votes (vote)
Target Milestone: ---
Assignee: Jeff Mahoney
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-20 05:47 UTC by Bernhard Wiedemann
Modified: 2018-07-03 20:35 UTC (History)
3 users (show)

See Also:
Found By: System Test
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
coolo: SHIP_STOPPER+


Attachments
serial console log with Oops+backtrace (4.78 KB, text/plain)
2011-01-07 08:34 UTC, Bernhard Wiedemann
Details
serial console log with Oops+backtrace from 2.6.37-default (8.13 KB, text/plain)
2011-01-11 14:50 UTC, Bernhard Wiedemann
Details
Default kernel log (850.75 KB, image/jpeg)
2011-02-07 23:43 UTC, Forgotten User ho8rvtClXX
Details
"Failsave" parameters (apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 x11failsafe vga=0x317) (894.23 KB, image/jpeg)
2011-02-07 23:45 UTC, Forgotten User ho8rvtClXX
Details
Default kernel log + nomodeset - flood in logs by udev (847.40 KB, image/jpeg)
2011-02-07 23:47 UTC, Forgotten User ho8rvtClXX
Details
System log after successeful boot (udev's flood again) (605.70 KB, text/plain)
2011-02-07 23:49 UTC, Forgotten User ho8rvtClXX
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Wiedemann 2010-12-20 05:47:25 UTC
openQA testing has shown complete system freeze early in booting or sometimes after install in 32-bit installs.

http://openqa.opensuse.org/results/openSUSE-NET-i586-Build0963
http://openqa.opensuse.org/results/openSUSE-NET-i586-Build0964
http://openqa.opensuse.org/results/openSUSE-NET-i586-Build0964-lxde

How To Reproduce:
1. qemu-kvm -m 1000 -cdrom factory/iso/openSUSE-NET-i586-Build0964-Media.iso
2. (maybe optional) on the boot prompt add nohz=off
3. optionally use F3 to select text mode to see console messages
4. press return to boot

Actual Results:
Boot will often stop after printing "
>>> openSUSE installation program v3.5.7...
<<<
Starting udev..."

Expected Results:
should work like yesterdays version

Reproducible: Sometimes

- sometimes x86_64 bit versions also showed this problem.
- also happens in VirtualBox
- from the test log's statuser values can be seen that it is busy-looping
Comment 1 Bernhard Wiedemann 2010-12-21 05:52:08 UTC
Now I have seen a kernel-panic on
http://openqa.opensuse.org/opensuse/permanent/bug/bug660464-2.jpg

So maybe it is actually a kernel-problem, that only started to be randomly triggered by something else later?
Comment 2 Bernhard Wiedemann 2011-01-04 11:28:34 UTC
http://www.linuxquestions.org/questions/slackware-14/current-randomly-timed-kernel-oops-on-bootup-of-two-test-boxen-852843/

discusses the very same bug. It appears to be a bug in the kernel's SCSI passthrough, triggered by udev-165 using an additional SCSI command.

Tests with today's openSUSE-GNOME-LiveCD-i686-Build0988-Media.iso on KVM had it failing in 15 of 20 tries. nohz=off is not required for that.
Comment 3 Jeff Mahoney 2011-01-06 19:55:03 UTC
Can you re-capture the oops but boot with panic_on_oops=1 so we can see the primary oops?
Comment 4 Bernhard Wiedemann 2011-01-07 08:34:24 UTC
Created attachment 407345 [details]
serial console log with Oops+backtrace

used console=ttyS0 instead
Comment 5 Bernhard Wiedemann 2011-01-08 22:10:47 UTC
I had a similar panic on my laptop (Amilo Pro 2010) with 2.6.37-rc7, 
but that went away when using 2.6.37 from Kernel:/HEAD
so there might already be a fix.
Comment 6 Bernhard Wiedemann 2011-01-11 14:50:02 UTC
Created attachment 407787 [details]
serial console log with Oops+backtrace from 2.6.37-default

log has one successful boot and one oops after reset,
so on KVM, bug might still be there with final 2.6.37
Comment 7 Stephan Kulow 2011-01-19 12:41:06 UTC
According to http://marc.info/?l=kernel-janitors&m=129378990812615&w=1 Mike can reproduce it too
Comment 8 Stephan Kulow 2011-01-19 13:53:28 UTC
Tejun has a working patch: http://marc.info/?l=linux-hotplug&m=129536338129945&w=2
Comment 9 Mike Galbraith 2011-01-19 14:33:35 UTC
(In reply to comment #7)
> According to http://marc.info/?l=kernel-janitors&m=129378990812615&w=1 Mike can
> reproduce it too

The crashes I could reproduce were cured by,..

patches.fixes/sched-cgroup-use-exit-hook-to-avoid-use-after-free-crash

..which is the patch in this thread, with another hunk to prevent the exit hook from messing with a failed fork child on it's way to the grave, and thereby making autogroup diddle freed memory.
Comment 10 Stephan Kulow 2011-01-20 09:15:12 UTC
ok, so the other bug is fixed by #8 - if someone could push it to master asap I would be grateful
Comment 11 Jeff Mahoney 2011-01-21 23:43:40 UTC
(In reply to comment #7)
> According to http://marc.info/?l=kernel-janitors&m=129378990812615&w=1 Mike can
> reproduce it too

No, according to that thread, Mike could produce /an/ Oops. Not /this/ Oops.

I've applied the patch from comment #8 to the repo and have forced an update to Kernel:HEAD for testing.

Please try a kernel with the following changelog entry and report back.

ata: Fix panics with ata_id (bnc#660464).
Comment 12 Bernhard Wiedemann 2011-01-25 17:46:08 UTC
No more i586 crashes on openQA in over 20 testruns since this went into Factory.
Can not yet tell about i686 LiveCDs, since none were built so far.
But looks good.
Comment 13 Jeff Mahoney 2011-01-25 17:53:41 UTC
Thanks. I'll close as FIXED. Please re-open if the LiveCDs fail.
Comment 14 Bernhard Wiedemann 2011-01-29 18:08:35 UTC
Bug has not been seen again. Not even on LiveCDs.
Comment 15 Forgotten User ho8rvtClXX 2011-02-07 23:43:25 UTC
Created attachment 412674 [details]
Default kernel log

After update from 11.3 to 11.4-M6 (x86_64) my system (laptop hp-compaq 6720s) totally freezes on boot. It's happens almost always (~9 times of 10, roughly). In console I saw only "Creating device nodes with udev", it's all. This problem I saw in 11.3 with newer kernels (2.6.36, 2.6.37)
Comment 16 Forgotten User ho8rvtClXX 2011-02-07 23:45:45 UTC
Created attachment 412675 [details]
"Failsave" parameters (apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 x11failsafe vga=0x317)
Comment 17 Forgotten User ho8rvtClXX 2011-02-07 23:47:13 UTC
Created attachment 412676 [details]
Default kernel log + nomodeset - flood in logs by udev
Comment 18 Forgotten User ho8rvtClXX 2011-02-07 23:49:09 UTC
Created attachment 412677 [details]
System log after successeful boot (udev's flood again)
Comment 19 Forgotten User ho8rvtClXX 2011-02-07 23:51:35 UTC
Bug is here (see above).
Comment 20 Forgotten User ho8rvtClXX 2011-02-13 01:24:51 UTC
11.4 RC1 - bug is still here
Comment 21 Stephan Kulow 2011-02-15 08:32:17 UTC
sorry, this is a different bug. So please track it as a different number. Your problem is hardware specific - or it wouldn't go away with kernel parameters.