|
Bugzilla – Full Text Bug Listing |
| Summary: | quake110.suse.de: soft system hang, within one day of stress | ||
|---|---|---|---|
| Product: | [SUSE Linux Enterprise Real Time Extension] SUSE Linux Enterprise Real Time 10 SP2 (SLERT 10 SP2) | Reporter: | Daniel Gollub <dgollub> |
| Component: | kernel | Assignee: | Erik Hamera <erik.hamera> |
| Status: | RESOLVED WONTFIX | QA Contact: | Erik Hamera <erik.hamera> |
| Severity: | Major | ||
| Priority: | P5 - None | ||
| Version: | RC1 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
screenlog of quake110.suse.de - SysRq with nmi_watchdog=1
quake110.suse.de - beta7 serial console log of sysrq |
||
|
Description
Daniel Gollub
2008-05-09 15:01:42 UTC
Created attachment 214707 [details] screenlog of quake110.suse.de - SysRq with nmi_watchdog=1 Host: quake110.suse.de Kernel: SLERT 10 SP2 RC 1 (mbuild, not from media) Kernel Build: /mounts/users-space/dgollub/SLES-10-SP2-RT-KERNEL-ARCHIVE/SLERT_SP2_RC1/sle10-sp2-rt-x86_64/kernel-rt-2.6.22.19-0.8.x86_64.rpm http://w3.suse.de/~dgollub/SLERT/SLERT-testresults/longterm1/ Approx. runtime till soft hang with nmi_watchdog=1: 3 hours SysRq got triggered approx. 40 hours later. Serial console log of SysRq of showTasks, showBlockedTasks and others attached. Crashdump is also available: /mounts/users-space/dgollub/crashdump/SLERT/10/SP2/RC1-mbuild/quake110.suse.de/bnc399793#1/2008-05-13-04:51/vmcore Created attachment 215022 [details] quake110.suse.de - beta7 serial console log of sysrq Host: quake110.suse.de Kernel: Beta7 (mbuild, not from media) Kernel Build: /mounts/users-space/dgollub/SLES-10-SP2-RT-KERNEL-ARCHIVE/SLERT_ SP2_BETA7_IGNORESPURIOUSIRQ_20080430/sle10-sp2-rt-x86_64/ Regression check with Beta 7 kernel. http://w3.suse.de/~dgollub/SLERT/SLERT-testresults/longterm2.1/ Approx. runtime till soft hang (without nmi_watchdog): 14 hours Serial console log of SysRq of showTasks, showBlockedTasks and others attached. Crashdump is also available: /mounts/users-space/dgollub/crashdump/SLERT/10/SP2/Beta7-mbuild/quake110.suse.de/bnc399793#2/2008-05-14-01\:49/vmcore prio-preempt involved:
KERNEL: vmlinux-2.6.22.19-0.8-rt
DUMPFILE: ../quake110.suse.de/bnc399793#1/2008-05-13-04:51/vmcore
CPUS: 4
DATE: Tue May 13 10:50:13 2008
UPTIME: 1 days, 19:23:18
LOAD AVERAGE: 49.91, 48.23, 47.46
TASKS: 235
NODENAME: quake110
RELEASE: 2.6.22.19-0.8-rt
VERSION: #1 SMP PREEMPT RT 2008-05-06 20:07:24 +0200
MACHINE: x86_64 (3200 Mhz)
MEMORY: 3.9 GB
PANIC: "SysRq : Trigger a crashdump"
PID: 471
COMMAND: "IRQ-4"
TASK: ffff81011476e040 [THREAD_INFO: ffff81011171c000]
CPU: 1
STATE: TASK_INTERRUPTIBLE (SYSRQ)
crash> ps | grep ">"
> 471 2 1 ffff81011476e040 IN 0.0 0 0 [IRQ-4]
> 11239 9466 2 ffff8100a9239810 RU 0.1 264404 4880 prio-preempt
> 11240 9466 0 ffff8100a9239040 RU 0.1 264404 4880 prio-preempt
> 11241 9466 3 ffff8100a1c0e810 RU 0.1 264404 4880 prio-preempt
crash> task 11239 11240 11241 | grep rt_priori
rt_priority = 81,
rt_priority = 81,
rt_priority = 81,
crash>
Renaming the bug from hard hang to soft hang, since the initial report could be a fail diagnose since the IRQ handler for the serial console wasn't bumped up to SCHED_FIFO 99. CPU hogs could starve the IRQ thread, which might make SysRq unusable in the initial report. Lowering severity to major, since this looks like a testcase issue (file write access deadlock with cpu hogs - in combination with CTCS2) I talked to svollath, who is responsible of the QA-Lab relocation. Quake110.suse.de will be available before end of next week. Still valid? Please re-confirm the issue against the Update 4 Kernel. No Response. |