Bug 461241 - hard hangs and unkillable processes
Summary: hard hangs and unkillable processes
Status: RESOLVED FIXED
Alias: None
Product: openSUSE 11.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: x86-64 Other
: P2 - High : Major with 11 votes (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-20 21:40 UTC by august miles
Modified: 2009-03-03 15:24 UTC (History)
3 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description august miles 2008-12-20 21:40:32 UTC
I have just updated to 11.1. From 11.0

Our machine is a dual process xeon, used as a web server/imap mail.
It is administered remotely over ssh.

I had a imap process on my server using 100% of CPU. I tried using kill -9
to stop it without success.


I then tried taking down dovecot imap with
"rcdovecot stop"

at this point I could no longer get the command prompt.
I am now unable to login to the machine- it does answer a ping.


I am filing under kernel- a runaway process should not bring down
the connectivity of a machine in any situation.
Comment 1 florian florian 2008-12-29 18:00:48 UTC
might be a duplicate of #460634
Comment 2 august miles 2009-01-10 16:39:29 UTC
I am the original reporter-

I have a second machine that had a similar problem with Okular,
which was viewing a dvi file that was generated by latex. It was
running at 100% cpu, again a 64-bit xeon.

The okular program was launched as a subprocess to emacs-gtk, which also
hang without using cpu.

When switching to a virtual terminal the whole machine hang

-----------

I have downgraded to an 11.0 kernel on the original machine running
imap, since there has been no problems. (userland remains 11.1)
Comment 3 Boris Wesslowski 2009-02-06 17:42:44 UTC
We have the same problem on a HP ML110 G5 Server with Xeon Quadcore Processor and two SATA Disks in RAID1, it is also dovecot that has shown unkillable processes, especially when the imap client is closed. The system degrades from the point where they appear until disk accesses seem to hang, xosview shows all cores in 100% wait states, the system still reponds to pings and the dhcp server even is able to write to syslog, but everything else seems to be waiting for the disks. At that point the console is usually black and won't react to Ctrl-Alt-Del, but Alt-Printscreen-b boots. Disabling the irq_balancer seems to make the hangs happen less often. We have not tried using some other kernel yet.
Comment 4 Boris Wesslowski 2009-02-10 18:26:47 UTC
Here's an update: We tried the "vanilla" kernel as supplied with openSUSE 11.1 and the server hung again, and again shortly after a certain (windows/thunderbird) user closes his dovecot IMAP mailbox at the end of the day...
Comment 5 Greg Kroah-Hartman 2009-02-11 16:23:20 UTC
Any clues as to where the machine is hung?

alt-sysrq-t should show you the task list, we are going to need some kind of clue to be able to work on this.
Comment 6 august miles 2009-02-11 16:31:06 UTC
I would guess it is this known problem of dovecot with
2.6.27 kernels...

http://www.mail-archive.com/dovecot@dovecot.org/msg15054.html

It seems to be a problem with inotify.
This would be coherent with the problems I have also had with Okular,
which I understand also uses inotify.


I have been running weeks with the downgraded kernel (2.6.25.18-0.2 from
opensuse 11.0), but 11.1 userland. It is totally stable.
Comment 7 Greg Kroah-Hartman 2009-02-11 17:04:21 UTC
Ah, yeah, that should be the issue.  That is solved in the updated kernel package.  If you get that, this should go away, can you try that?
Comment 8 Boris Wesslowski 2009-02-12 10:13:01 UTC
Sorry for the dumb question, but is "the updated kernel package" supposed to be the one in http://download.opensuse.org/repositories/Kernel:/SL111_BRANCH/openSUSE_11.1/x86_64/ ?
Comment 9 Greg Kroah-Hartman 2009-02-12 16:52:26 UTC
(In reply to comment #8)
> Sorry for the dumb question, but is "the updated kernel package" supposed to be
> the one in
> http://download.opensuse.org/repositories/Kernel:/SL111_BRANCH/openSUSE_11.1/x86_64/
> ?

Yes, you can use that one.
Comment 10 Boris Wesslowski 2009-02-13 13:43:08 UTC
(In reply to comment #9)
> > http://download.opensuse.org/repositories/Kernel:/SL111_BRANCH/openSUSE_11.1/x86_64/
> > ?
> 
> Yes, you can use that one.

kernel-default-2.6.27.8-11.1.x86_64.rpm from the url above does not fix the problem for us, an unkillable imap process appeared again and the machine was not able to shut down and reboot by itself (I am not on site to check details)...
Comment 11 Boris Wesslowski 2009-03-03 11:44:27 UTC
The problem does not happen with the recently released kernel-default-2.6.27.19-3.2.1, I consider this bug closed.
Comment 12 Greg Kroah-Hartman 2009-03-03 15:24:27 UTC
thanks for letting us know.