Bug 460634

Summary: Unkillable process which needs 100%
Product: [openSUSE] openSUSE 11.1 Reporter: Andreas Schneider <anschneider>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: mardnh, novell.com
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Andreas Schneider 2008-12-19 11:01:36 UTC
From time to time if I close kmail, kmail stays as a process (state: running) needing 100% of the CPU. You can't attach gdb or strace to the process.
If you attach gdb with gdb to the process (gdb --args gdb --pid <pid>) then you can see that it hangs in a wait function for the pid.

I run into the problem on my workstation (Core Duo) and at home with a single core AMD. The single core freezes completely. I've talked to Stefan at the party yesterday and he told me that he sometimes runs into the same problem with Thunderbird.
Comment 1 Stefan Assmann 2008-12-19 11:06:51 UTC
for reference to my issue see bug #458230
Comment 2 Martin Hauke 2008-12-22 20:24:43 UTC
I had similar problems on my 64bit dualcore workstation with the following processes:

-dovecot (imap)
-kmail

The problem seems to be related to the following kernel bug:
http://lists-archives.org/linux-kernel/19774930-2-6-27-inotify-causes-unkillable-processes-with-100-cpu-usage.html

Since i've updated my kernel to vanilla 2.6.27.10 i've never encountered that issue anymore.
Comment 3 Andreas Schneider 2008-12-23 18:54:33 UTC
I wasn't able to reproduce it today. So I can't provide a dump to verify it. I will try to reproduce it on my notebook in the next days.
Comment 4 florian florian 2008-12-29 18:07:27 UTC
Similar issue here, kernel hangs sometimes with dovecot running and thunderbird accessing it from a remote machine. Confer the thread starting with <http://www.dovecot.org/list/dovecot/2008-December/035662.html> on the dovecot mailing list.

"strace -tt dovecot -F" on the broken machine gives the following output:

[...]
16:32:41.634505 epoll_wait(9, {}, 21, 999) = 0
16:32:42.633796 gettimeofday({1230564762, 633972}, {0, 0}) = 0
16:32:42.634194 gettimeofday({1230564762, 634317}, NULL) = 0
16:32:42.634491 epoll_wait(9, {}, 21, 999) = 0
16:32:43.633809 gettimeofday({1230564763, 633987}, {0, 0}) = 0
16:32:43.634209 gettimeofday({1230564763, 634329}, NULL) = 0
16:32:43.634503 epoll_wait(9,

The output suddenly stops there when the kernel crashes.
Comment 5 florian florian 2009-01-09 21:13:59 UTC
Kernel 2.6.28 plain vanilla runs fine here. Server has been running for 90 hours now without crash.
Comment 7 Andreas Schneider 2009-01-20 08:40:23 UTC
Since I'm running Kernel 2.6.27.10 the problem is gone.
Comment 8 Greg Kroah-Hartman 2009-01-21 19:51:08 UTC
thanks, am closing out.