|
Bugzilla – Full Text Bug Listing |
| Summary: | Kernel segfault in kernel/timer.c - comm: stapio (process) - related to preloadtrace.ko | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.4 | Reporter: | Thomas Renninger <trenn> |
| Component: | Kernel | Assignee: | E-mail List <kernel-maintainers> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | bjorn.helgaas, coolo, forgotten_a525umNONh, jbohac, kent.liu, rcoe, tonyj, youquan.song |
| Version: | Milestone 5 of 6 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Thomas Renninger
2011-01-12 22:32:16 UTC
The reason I posted the backtrace on bug #647029 is that there were two similar/same backtraces/segfaults posted: https://bugzilla.novell.com/show_bug.cgi?id=647029#6 https://bugzilla.novell.com/show_bug.cgi?id=647029#9 while the bug itself is about something else. Jiri: Eventually you have an idea why: kernel/timer.c:681 BUG_ON(!timer->function); can happen? It could/should be HW related, the other bug report also is about a Nehalem machine. I tried clocksoure=hpet, nox2apic but it did not help. Hm, I'll upgrade the BIOS first and will report back. Moving away preloadtrace.ko driver so that it does not get loaded works around the issue. There seem to be at least two machines affected: - HP Z600, a workstation (compare with bug #647029) - Boxboro-EX, a huge server both have Intel Nehalem CPUs. google is full of these, e.g. http://sourceware.org/bugzilla/show_bug.cgi?id=10651 Thanks for the pointer Coolo. This sounds rather sane: commit 3fd1c490 switches to atomic_t flags to the timer callback functions to defeat mod_timer() calls that might overlap a del_timer_sync. commit f2b610b does likewise for the transport layer timers. And would explain that the race, triggered by the preloadtrace driver is only triggered on latest/fastest Nehalem machines. The problem is: which git tree are the commit ids referring to, it's not the Linus or x86/tip tree. I'll ask the guy and report back. It's systemtap git tree. This driver is generated from systemtap "language" and the systemtap compiler seem to be broken, it's not written in C. Can't the kernel driver (preloadtrace.ko) be written in C, then this could possibly even end up mainline. I added the mentioned systemtap git commits from: http://sourceware.org/bugzilla/show_bug.cgi?id=10651 to the our systemtap devel:tools repo and did a submitrequest against openSUSE:Factory. Afaik, it's not verified whether these fix the issue. Tony: I also cannot judge these patches at all, as they are in the mainline systemtap git I just added them, hope that's fine. I close the bug already. I will try to reproduce it, once the whole build queue succeeded (and I find the time for it) and reopen if I should be able to run into it again. (In reply to comment #5) > I added the mentioned systemtap git commits from: > http://sourceware.org/bugzilla/show_bug.cgi?id=10651 > to the our systemtap devel:tools repo and did a submitrequest against > openSUSE:Factory. Afaik, it's not verified whether these fix the issue. > > Tony: I also cannot judge these patches at all, as they are in the mainline > systemtap git I just added them, hope that's fine. I'd rather you not do this in the future, especially if you "cannot judge the patches". The process I believe is, you branch devel:tools/systemtap, make a change and submit them as a change request back to devel:tools/systemtap Also, next time can you add me as a cc: to the bug before it's closed ;-) I'm confused as I just got this auto e-mail: Subject: [ci_new_pac] JFYI systemtap -> stable Package is "systemtap", Maintainer is "trenn@novell.com" |