Bug 383671

Summary: graffin.suse.de - system soft hang
Product: [SUSE Linux Enterprise Real Time Extension] SUSE Linux Enterprise Real Time 10 SP2 (SLERT 10 SP2) Reporter: Daniel Gollub <dgollub>
Component: kernelAssignee: Felix Foerster <ffoerster>
Status: RESOLVED INVALID QA Contact: Erik Hamera <erik.hamera>
Severity: Critical    
Priority: P5 - None CC: ihno, tonyj
Version: BETA6   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: 2.6.22.19-20080423_notqirq-rt - hang - SysRq showTasks and showBlockedTasks

Description Daniel Gollub 2008-04-25 08:45:01 UTC
Created attachment 210429 [details]
2.6.22.19-20080423_notqirq-rt - hang - SysRq showTasks and showBlockedTasks

Host: graffin.suse.de
Kernel: 2.6.22.19-20080423_notqirq-rt

SysRq was working. See attached showTasks and showBockedTasks from serial console.


Machine hang several hours after SLERT testsuite. Network was down (didn't response on ping). Login prompt only allowed to enter username once - no password prompt nor authentication error appeared. SysRq was still working.

There are no kernel log message about missed interrupts or devices which got dropped.
Comment 1 Sven Dietrich 2008-04-30 00:34:23 UTC
Tony Offered to help on this one.
Comment 2 Sven Dietrich 2008-05-01 07:00:53 UTC
Can you attach a pointer to the kernel packages for this Kernel? I want to check out some of these traces a little more closely.
Comment 3 Sven Dietrich 2008-05-01 08:32:39 UTC
Can you try to reproduce and see what SysRq-'d' outputs?
There are a lot of tasks to weed through in the attached trace dump.
Comment 4 Daniel Gollub 2008-05-01 09:41:25 UTC
(In reply to comment #2 from Sven Dietrich)
> Can you attach a pointer to the kernel packages for this Kernel? I want to
> check out some of these traces a little more closely.
> 

wotan.suse.de:/mounts/users-space/dgollub/SLES-10-SP2-RT-KERNEL-ARCHIVE/SLERT_SP2_BETA6_NOTEIRQ_20080424/sle10-sp2-rt-x86_64/
Comment 5 Ihno Krumreich 2008-05-02 16:01:02 UTC
Slert Status Call: Reduced to critical. Felix to reproduce it.
Comment 6 Daniel Gollub 2008-05-05 09:44:45 UTC
(In reply to comment #0 from Daniel Gollub)
> Created an attachment (id=210429) [details]
> 2.6.22.19-20080423_notqirq-rt - hang - SysRq showTasks and showBlockedTasks
> 
> Host: graffin.suse.de
> Kernel: 2.6.22.19-20080423_notqirq-rt
> 
JFYI, machine was booted with kernel parameter enable_rt_pcix_apic.
Going to rerun also with this parameter. Issue  didn't appear without the parameter (but "irq nobody cared" showed up...).
Comment 7 Daniel Gollub 2008-05-07 09:32:50 UTC
Machine is now up 24 hours with same kernel (comment 0) and parameter (comment 6) without any hangs. Keep machine running for another 24 hours ...
Comment 8 Daniel Gollub 2008-05-08 12:32:31 UTC
(In reply to comment #7 from Daniel Gollub)
> Machine is now up 24 hours with same kernel (comment 0) and parameter (comment
> 6) without any hangs. Keep machine running for another 24 hours ...

Machine survived another day. No hang...

Upgrade to RC1 kernel-rt?
Should we load the machine with stress?
Comment 9 Daniel Gollub 2008-05-08 15:23:53 UTC
Sven, could you give us a hint what kind of load we should produce to trigger this bug again?
Comment 11 Daniel Gollub 2008-09-02 08:53:48 UTC
We were unable to reproduce this issue for a long time now. Closing the ticket, going to reopen once this issue appears again.