Bug 568120

Summary: Kernel crash - BUG: scheduling while atomic
Product: [openSUSE] openSUSE 11.2 Reporter: Daniele Tombolini <kailed>
Component: KernelAssignee: Jeff Mahoney <jeffm>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P2 - High CC: angie, erwinl, forgotten_-yQj4fdAjs, forgotten_N1m2whZ-xl, harbrink, jeffm, Joachim.Reichelt, jose.lpa, lavrinenko_alex, lbickley, lchiquitto, lsteeger, meissner, petr.m, revealed, sebastien.rohaut, valerio.bontempi
Version: Final   
Target Milestone: ---   
Hardware: 32bit   
OS: openSUSE 11.2   
Whiteboard: maint:released:11.2:30542
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmesg output
dmesg with kotd (2.6.31.11-0.0.0.15.4478caa)

Description Daniele Tombolini 2010-01-04 19:52:47 UTC
Created attachment 334865 [details]
dmesg output

User-Agent:       Mozilla/5.0 (compatible; Konqueror/4.3; Linux) KHTML/4.3.4 (like Gecko) SUSE

From dmesg:
[    0.010469] Unpacking initramfs...
[    0.018008] BUG: scheduling while atomic: swapper/0/0x10000002
[    0.018018] Modules linked in:
[    0.018024] Pid: 0, comm: swapper Not tainted 2.6.31.8-0.1-desktop #1
[    0.018027] Call Trace:
[    0.018047]  [<c020845a>] try_stack_unwind+0x17a/0x1a0
[    0.018054]  [<c020708c>] dump_trace+0x6c/0x130
[    0.018059]  [<c0208008>] show_trace_log_lvl+0x58/0x80
[    0.018064]  [<c0208056>] show_trace+0x26/0x40
[    0.018071]  [<c0692e53>] dump_stack+0x79/0x91
[    0.018079]  [<c023f137>] __schedule_bug+0x87/0x90
[    0.018084]  [<c0693b88>] schedule+0x688/0x7a0
[    0.018093]  [<c02480eb>] __cond_resched+0x2b/0x60
[    0.018098]  [<c0693dfd>] _cond_resched+0x3d/0x50
[    0.018104]  [<c02d3adc>] generic_perform_write+0x13c/0x1e0
[    0.018109]  [<c02d3bfb>] generic_file_buffered_write+0x7b/0x150
[    0.018114]  [<c02d55c3>] __generic_file_aio_write_nolock+0x213/0x530
[    0.018119]  [<c02d5a15>] generic_file_aio_write+0x65/0xe0
[    0.018125]  [<c031538c>] do_sync_write+0xdc/0x130
[    0.018130]  [<c03156ba>] vfs_write+0xba/0x1b0
[    0.018134]  [<c03160c3>] sys_write+0x53/0xa0
[    0.018142]  [<c09815c3>] do_copy+0x3d/0xe6
[    0.018146]  [<c0980f6b>] flush_buffer+0x81/0xb8
[    0.018153]  [<c09a770d>] gunzip+0x374/0x428
[    0.018159]  [<c098147b>] unpack_to_rootfs+0x29f/0x3aa
[    0.018163]  [<c0981ef9>] populate_rootfs+0x59/0x87
[    0.018168]  [<c097fcd3>] start_kernel+0x38d/0x3ae
[    0.018174]  [<c097f087>] i386_start_kernel+0x87/0x9f

I need to press power bottom after a while...
Known issue:
http://lists.opensuse.org/opensuse-kernel/2009-12/msg00034.html
So why did you release such buggy update ?

2.6.31.8-0.1-desktop

full dmesg attached.

Reproducible: Always

Steps to Reproduce:
1.install 2.6.31.8-0.1
2.
3.
Actual Results:  
crash
Comment 1 revealed revealed 2010-01-04 20:29:12 UTC
Hello! 

i got this exact same issue. 11.2 and: 2.6.31.8-0.1-desktop i386 GNU/Linux

Thanks.

Greetings,

R
Comment 2 Erwin Lam 2010-01-05 09:41:18 UTC
Same problem here with openSUSE 11.2 and the new desktop kernel 2.6.31.8-0.1-desktop, except that there is no crash. The system is running but I don't know what the impact is of this error.
Comment 3 revealed revealed 2010-01-05 13:21:40 UTC
Ah sorry mine is not crashing too, but i am receiving the same stacktrace with same [<xy>] .. letters and numbers in each lines.
Comment 4 Daniele Tombolini 2010-01-05 17:02:57 UTC
Ok, on my notebook hard lookup when xorg starts and the power button is the only way.

Laptop does not crash but I did not test for too long, just few minutes..
Enough to be critical.
Comment 5 Forgotten User -yQj4fdAjs 2010-01-05 17:42:47 UTC
same for me, but no crash/hangup. simply the messages on boot.
Comment 6 Valerio Bontempi 2010-01-12 14:09:59 UTC
Same for me, but no crash/hangup, higher CPU load and high swap activity.
Comment 7 Daniele Tombolini 2010-01-12 18:12:09 UTC
Well, there are two bugs.
1) the well known "atomic" -> it seems not so serious
2) #568307 - wireless issue (rt2860 driver) --> crash
Comment 8 Angelika Schulz 2010-01-13 15:50:47 UTC
Hi there,

I am having the same trouble .. but since my search seemed so stupid and got no results I created an extra bug report for this. Sorry for that. See bug #570316.

During an automated installation the system will freeze after the main package installation, but run fine after manually shutting it down.

Bye and thanks,
Angie.
Comment 9 Leonardo Chiquitto 2010-01-13 15:56:34 UTC
*** Bug 570316 has been marked as a duplicate of this bug. ***
Comment 10 Jeff Mahoney 2010-01-13 16:11:36 UTC
The scheduling while atomic issue comes from my patch to override ACPI tables from the initramfs. I've disabled them in the repo until I can come up with a workaround.
Comment 11 Jeff Mahoney 2010-01-13 18:45:51 UTC
*** Bug 568244 has been marked as a duplicate of this bug. ***
Comment 12 Sebastien ROHAUT 2010-01-14 20:12:22 UTC
Same here, same kernel, but x86_64.
Comment 13 Jeff Mahoney 2010-01-14 21:21:15 UTC
Yep, it will occur on any machine with ACPI and preemption enabled, which means both the i386 and x86_64 desktop kernels.

I've updated the 11.2 branch with an updated patch set for this. A kernel containing at least the following entry will be needed for testing:

-------------------------------------------------------------------
Thu Jan 14 19:50:46 CET 2010 - jeffm@suse.de

- patches.suse/add-initramfs-file_read_write: Build fix

http://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.2/
Comment 14 Jeff Mahoney 2010-01-15 18:55:58 UTC
*** Bug 568638 has been marked as a duplicate of this bug. ***
Comment 15 Jeff Mahoney 2010-01-15 18:56:43 UTC
*** Bug 568801 has been marked as a duplicate of this bug. ***
Comment 16 revealed revealed 2010-01-16 10:01:37 UTC
Hello there,

sorry i have to say that i can not thest the KOTD, beacuse it would require kernel-source and other rpm's to install on my system. I'm getting failed dependencies.

Greetings,
Comment 17 Jeff Mahoney 2010-01-16 16:03:11 UTC
kernel-source is in the src/ dir, but you don't need it anyway.

All you need is kernel-$flavor.rpm unless you're doing kernel debugging and development.
Comment 18 revealed revealed 2010-01-16 19:46:44 UTC
Sorry i would, if i could. Hopefully one of the others here can test some?
Comment 19 Sebastien ROHAUT 2010-01-16 20:12:05 UTC
Hi, Tried 2.6.31.11 from KOTD, x86รจ64, desktop. Here are the result :

[    0.118766] BUG: scheduling while atomic: swapper/0/0x10000002                                                                                                                   
[    0.118773] Modules linked in:                                                                                                                                                   
[    0.118776] Pid: 0, comm: swapper Not tainted 2.6.31.11-0.0.0.15.4478caa-desktop #1                                                                                              
[    0.118778] Call Trace:                                                                                                                                                          
[    0.118794]  [<ffffffff81011a19>] try_stack_unwind+0x189/0x1b0                                                                                                                   
[    0.118798]  [<ffffffff8101025d>] dump_trace+0xad/0x3a0                                                                                                                          
[    0.118802]  [<ffffffff81011524>] show_trace_log_lvl+0x64/0x90                                                                                                                   
[    0.118805]  [<ffffffff81011573>] show_trace+0x23/0x40                                                                                                                           
[    0.118810]  [<ffffffff81552cc2>] dump_stack+0x81/0x9e                                                                                                                           
[    0.118815]  [<ffffffff81056f32>] __schedule_bug+0x92/0xa0                                                                                                                       
[    0.118819]  [<ffffffff81553bff>] thread_return+0x2a7/0x3c8                                                                                                                      
[    0.118823]  [<ffffffff81060dc8>] __cond_resched+0x38/0x80                                                                                                                       
[    0.118826]  [<ffffffff81553ebd>] _cond_resched+0x4d/0x60                                                                                                                        
[    0.118831]  [<ffffffff8130575e>] acpi_ps_complete_op+0x2c2/0x2eb                                                                                                                
[    0.118835]  [<ffffffff81305c89>] acpi_ps_parse_loop+0x371/0x3d0
[    0.118838]  [<ffffffff81304976>] acpi_ps_parse_aml+0x119/0x404
[    0.118842]  [<ffffffff81303528>] acpi_ns_one_complete_parse+0x144/0x175
[    0.118845]  [<ffffffff813035b3>] acpi_ns_parse_table+0x5a/0xb3
[    0.118849]  [<ffffffff812ff0f3>] acpi_ns_load_table+0x87/0x138
[    0.118852]  [<ffffffff813087fe>] acpi_tb_load_namespace+0x80/0x163
[    0.118856]  [<ffffffff813088fe>] acpi_load_tables+0x1d/0x5c
[    0.118861]  [<ffffffff81a071e1>] acpi_early_init+0x85/0x12e
[    0.118866]  [<ffffffff819d363e>] start_kernel+0x3c4/0x3e6
[    0.118870]  [<ffffffff819d268d>] x86_64_start_reservations+0x134/0x14f
[    0.118873]  [<ffffffff819d2803>] x86_64_start_kernel+0x15b/0x17e

Sorry...

But I don't have any crashes (even in 2.6.31.8).
Comment 20 Leonardo Chiquitto 2010-01-16 20:27:02 UTC
Created attachment 336995 [details]
dmesg with kotd (2.6.31.11-0.0.0.15.4478caa)

Jeff, kernel 2.6.31.11-0.0.0.15.4478caa is definitely an improvement. With the latest official update I was getting at least a dozen "scheduling while atomic" call traces during boot. With KOTD I get only one.
Comment 21 Jeff Mahoney 2010-01-16 21:04:00 UTC
Strange. I wonder why I wasn't running into that. Fortunately this part should be easier to work around. I guess I just need to load the table into memory and not actually do anything with it instead of loading the table into the ACPI stack.
Comment 22 Angelika Schulz 2010-01-18 11:00:45 UTC
Hi there,

Which of the kernel packages shall I install now? My oiginal packages had been:
kernel-desktop-2.6.31.5-0.1.1.i586.rpm 
preload-kmp-desktop-1.1_2.6.31.5_0.1-6.8.1.i586.rpm

So I downloaded kernel-desktop.rpm and kernel-desktop-base.rpm, installed them, rebooted and got a kernel OOPS with complete machine freezing shortly after the booting process started. 

f5f7d-desktop #1
[0.172216]Call Trace:
[0.172281] [<c020883a>] try_stack_unwind+0x17a/0x1a0
[0.172350] [<c020746c>] dump_trace+0x6c/0x130
[0.172417] [<c02083e8>] show_trace_log_lvl+0x58/0x80
[0.172485] [<c0208436>] show_trace+0x26/0x40
[0.172553] [<c06936e3>] dump_stack+0x79/0x91
[0.172619] [<c0693756>] panic+0x5b/0x145
[0.172686] [<c0255605>] do_exit+0x2d5/0x350
[0.172983] [<c0697fb7>] oops_end+0xb7/0x110
[0.173054] [<c022fba4>] no_context+x0d4/0xf0
[0.173121] [<c022fcc5>] __bad_area_nosemaphore+0x105/0x1d0
[0.173189] [<c022fdaf>] bad_area_nosemaphore+0x1f/0x40
[0.173256] [<c0699bbb>] do_page_fault+0x39b/0x440
[0.173324] [<c069722b>] error_code+0x73/0x78
[0.173391] [<c02017ac>] initramfs_file_write+0xec/0x190
[0.173458] [<c0201899>] initramfs_write+0x49/0x90
[0.173525] [<c0981e73>] do_copy+0x32/0xd7
[0.173591] [<c0980f6b>] flush_buffer+0x81/0xb8
[0.173657] [<c09a774d>] gunzip+0x374/0x428
[0.173723] [<c098147b>] unpack_to_rootfs+0x29f/0x3aa
[0.173790] [<c0981e13>] populate_rootfs+0x59/0x87
[0.173857] [<c097fcd3>] start_kernel+0x38d/0x3ae
[0.173984] [<c097f007>] i386_start_kernel+0x87/0x9f

Please note: This trace has been entered manually since I had no chance of copy+paste.

Bye,
Angie.
Comment 23 Jeff Mahoney 2010-01-18 14:55:45 UTC
Ugh. Ok. It looks like it will probably be a better idea to test kernels out of my build service project instead of committing to the repo. This worked fine for me. For now, just back out to the last update kernel.
Comment 24 Petr Matula 2010-01-18 14:56:53 UTC
I've got the same problem ("atomic" bug without crash) with kernel 2.6.31.8-0.1-desktop. Kernel 2.6.31.8-0.1-default is OK.
Comment 25 Jeff Mahoney 2010-01-18 15:02:20 UTC
Ok, thanks for the feedback. We really don't need any more "me too" posts, though. Please only post new comments if you have new information. The cause of the messages is well known. It affects -desktop because -desktop has preemption enabled. It affects versions that have the generic apci table override code it in because, well, that's the code that's causing the messages.

Here's the thing. Outside of the oops in comment #22, which is a real crash, this problem is _not_ crashing systems. It's warning during boot. It looks like a real oops, but it's not a real oops.

Here's why: The code paths are calling cond_resched() which was added to a number of places to improve latency. Part of it checks to ensure that the code is not running with a spinlock held, in interrupt context, or some other conditions. It issues the message everyone's been reporting when it detects that. Generally, it's a good thing.

For the code that we're using here, though, init hasn't even been started yet so it _can't_ actually schedule. The warnings are cosmetic even if they're really ugly.
Comment 26 revealed revealed 2010-01-18 16:50:14 UTC
(In reply to comment #22)
> Hi there,
> 
> Which of the kernel packages shall I install now? My oiginal packages had been:

Maybe it's a good idea for you to go back to the last known usable configuration?
I tried the kernel too, and got freeze. I toasted kernel 2.6.31.8-0.1-desktop to a cd and used rpm -e (kotdkernel) and rpm -Uhv 2.6.31.8-0.1-desktop to get back to a working configuration. Freeze in my case is probably caused due to a difficult kernel choice in general which caused incompatibilities.

Greetings,

R
Comment 27 Angelika Schulz 2010-01-19 07:02:52 UTC
Good Morning, 

well, my nice little machine has an additional kernel installed from the kernel:head repository. I chose the pae kernel here, allthough I am fairly sure I won't need it with my 1.5 GB of memory. 

kernel-pae-2.6.32-41.1.i586

Nevertheless, the other machines I install are not supposed to use that repository as well, so I just try to make sure that the other machines do _not_ install the desktop kernel. 

One machine I installed chose the default kernel, and it did not crash yet. I am installing via autoyast and have a script that will check for a specific kernel version (2.6.31.8) and will just try to downgrade the kernel as soon as it is detected.

I will now test the default kernel with my own machine and see whether the problems are gone with the normal kernels (default / pae / xen?).

I will come back with real feedback, promised!

Bye for now,
Angie.
Comment 28 Angelika Schulz 2010-01-19 10:52:07 UTC
Hi there,

I installed the following packages:

kernel-xen + kernel-xen-base: booted fine, no problems
kernel-pae + kernel-pae-baes: booted fine, no problems
kernel-default + kernel-default-base: booted fine, no problems

No errors had been reported, despite that one (but for each version):
Could not load /lib/modules/kernel-<version>/systemtap/preloadtrace.ko

I guess that's intended?

The machine I installed yesterday had "kernel-desktop-2.6.31.5-0.1.1.i586" installed fine and was running fine. Today a user logged in, started applications (KDE4 + Kontact) and the machine got an oops ... so I had to reboot the system. Having learned from the problem I started top on that machine since it was quite slow, even for a nearly empty KDE4 session .. I could see the preload_trace process using 99 percent of the CPU. 

That machine only has 512MB RAM and started swapping right after Kontact had been started - which _seemed_ to have caused the kernel oops together with the frozen system. Shortly after booting the machine I used my chance and installed kernel-default, which solved the problem completely - the machine is usable, even with the 2.6.31.8 kernel.

I would like to know how opensuse determines which kernel flavor is needed, since I had a big desktop of nearly the same age, 1GB RAM and it installed the default kernel. But on the smaller desktop it installed the desktop flavor ... For me the solution will be to get rid of the installed desktop kernel and replace it by the default kernel. But that is the tricky part during an automated installation.

Bye and thanks for your help,
Angie.
Comment 29 Eberhard Harbrink 2010-01-19 19:50:58 UTC
The BUG: scheduling while atomic: swapper/0/0x10000002
is fixed in 2.6.31.11-0.0.0.17.0f2b876-desktop. No more messages.
Comment 30 Vance Baarda 2010-01-28 17:28:55 UTC
Daniele:

I believe Jeff is waiting to hear from you. See comment 13.
Comment 31 Daniele Tombolini 2010-01-28 18:02:14 UTC
Uh!, sorry...
For me, fixed in kernel-desktop-2.6.31.11-bnc540589.0
Resetting needinfo.
Comment 32 Sebastien ROHAUT 2010-01-28 18:08:09 UTC
Hi,

kernel-desktop-2.6.31.12 from ktod, fixed for me.

Thank you.
Comment 33 Leonardo Chiquitto 2010-01-29 00:03:06 UTC
*** Bug 574910 has been marked as a duplicate of this bug. ***
Comment 34 Angelika Schulz 2010-02-01 08:44:52 UTC
Hi Again,

is there any chance that a patch will be put into the update repositories or do I need to fetch the kotd and install it? At the moment one of the machines that has an older kernel installed "2.6.31.5-0.1-default" gets kernel oopses quite often, whereas a machine with nearly the same setup seems to be quite stable. I would like to test a newer kernel, but would love to have it from the update repository.
 
The kernel 2.6.31.11 had already been okay for my personal machine .. 

Bye and many thanks,
Angie.
Comment 35 Leonardo Chiquitto 2010-02-01 10:43:47 UTC
*** Bug 575615 has been marked as a duplicate of this bug. ***
Comment 36 Marcus Meissner 2010-02-01 10:51:44 UTC
http://download.opensuse.org/update/11.2-test/  has the next update kernel  (2.6.31.12) checked in, feel free to test if it fixes this issue.
Comment 37 Angelika Schulz 2010-02-04 08:24:12 UTC
Hi there,

the kernel seems to be stable now, had no trouble with the swapper yet. I am quite happy now.

Thanks a bundle,
Angie.
Comment 38 Vance Baarda 2010-02-04 16:38:03 UTC
I've been running 2.6.31.12-0.1-desktop for several days now -- no troubles.
Comment 39 Swamp Workflow Management 2010-02-08 13:37:49 UTC
Update released for: kernel-debug, kernel-debug-base, kernel-debug-base-debuginfo, kernel-debug-debuginfo, kernel-debug-debugsource, kernel-debug-devel, kernel-debug-devel-debuginfo, kernel-default, kernel-default-base, kernel-default-base-debuginfo, kernel-default-debuginfo, kernel-default-debugsource, kernel-default-devel, kernel-default-devel-debuginfo, kernel-desktop, kernel-desktop-base, kernel-desktop-base-debuginfo, kernel-desktop-debuginfo, kernel-desktop-debugsource, kernel-desktop-devel, kernel-desktop-devel-debuginfo, kernel-pae, kernel-pae-base, kernel-pae-base-debuginfo, kernel-pae-debuginfo, kernel-pae-debugsource, kernel-pae-devel, kernel-pae-devel-debuginfo, kernel-source, kernel-source-vanilla, kernel-syms, kernel-trace, kernel-trace-base, kernel-trace-base-debuginfo, kernel-trace-debuginfo, kernel-trace-debugsource, kernel-trace-devel, kernel-trace-devel-debuginfo, kernel-vanilla, kernel-vanilla-base, kernel-vanilla-base-debuginfo, kernel-vanilla-debuginfo, kernel-vanilla-debugsource, kernel-vanilla-devel, kernel-vanilla-devel-debuginfo, kernel-xen, kernel-xen-base, kernel-xen-base-debuginfo, kernel-xen-debuginfo, kernel-xen-debugsource, kernel-xen-devel, kernel-xen-devel-debuginfo, preload-kmp-default, preload-kmp-desktop
Products:
openSUSE 11.2 (debug, i586, x86_64)
Comment 40 Joachim Reichelt 2010-02-10 20:25:24 UTC
Got the same in 11.3M1:

[    0.016206] Pid: 0, comm: swapper Not tainted 2.6.32-3-desktop #1
[    0.016269] Call Trace:
[    0.016340]  [<ffffffff81006219>] dump_trace+0x79/0x340
[    0.016406]  [<ffffffff814b7393>] dump_stack+0x69/0x6f
[    0.016471]  [<ffffffff814b81e1>] thread_return+0x367/0x386
[    0.016537]  [<ffffffff81045e45>] __cond_resched+0x25/0x40
[    0.016601]  [<ffffffff814b832d>] _cond_resched+0x2d/0x40
[    0.016666]  [<ffffffff810d3c1e>] generic_perform_write+0x14e/0x200
[    0.016732]  [<ffffffff810d3d3e>] generic_file_buffered_write+0x6e/0xd0
[    0.016798]  [<ffffffff810d4397>] __generic_file_aio_write+0x247/0x460
[    0.016863]  [<ffffffff810d461d>] generic_file_aio_write+0x6d/0xe0
[    0.016929]  [<ffffffff8111ae62>] do_sync_write+0xe2/0x120
[    0.017002]  [<ffffffff8111b138>] vfs_write+0xb8/0x1a0
[    0.017066]  [<ffffffff8111bbda>] sys_write+0x5a/0x110
[    0.017131]  [<ffffffff81b3530a>] do_copy+0x84/0xb0
[    0.017194]  [<ffffffff81b34d4c>] flush_buffer+0x7d/0xa4
[    0.017259]  [<ffffffff81b58b8d>] gunzip+0x411/0x4bf
[    0.017323]  [<ffffffff81b35176>] unpack_to_rootfs+0x2d3/0x3e3
[    0.017388]  [<ffffffff81b35b26>] populate_rootfs+0x5b/0x10a
[    0.017452]  [<ffffffff81b33dea>] start_kernel+0x302/0x318
[    0.017516]  [<ffffffff81b333f3>] x86_64_start_kernel+0xe5/0xe9
[    0.025030] BUG: scheduling while atomic: swapper/0/0x10000002
[    0.025094] Modules linked in:
[    0.025182] Pid: 0, comm: swapper Not tainted 2.6.32-3-desktop #1
Comment 41 Jeff Mahoney 2010-02-10 21:46:31 UTC
11.3M2 will contain a 2.6.33-rc based kernel that doesn't have the DSDT in initramfs patches.
Comment 42 Jeff Mahoney 2011-01-06 19:55:45 UTC
Closing this one as fixed.