|
Bugzilla – Full Text Bug Listing |
| Summary: | Kernel Bug in kernel-default-2.6.18.2-34 on x86-64 SMP machine | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Andreas Vetter <vetter> |
| Component: | Kernel | Assignee: | Nick Piggin <npiggin> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | aj, asklein, auxsvr, jeffm |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | hwinfo | ||
|
Description
Andreas Vetter
2007-01-17 15:51:31 UTC
Created attachment 113411 [details]
hwinfo
This is just a "warning" that something bad might have happened, the kernel caught it and continued on. Did the system continue to work just fine, or did other things go wrong? Is it easy to trigger this warning? what were you doing at the time? The machine works fine after that. I have no idea how to trigger it, since this is a machine in a pool for students and acct was not started by accident. The entry just before the bug is: Jan 12 16:13:53 wpyc009 sshd[23810]: Accepted publickey for ferfurth from 132.187.42.39 port 59835 ssh2 Since this seems to be filesystem dependent: The machine has / and /tmp on a reiserfs. Nobody can insert floppies, CDs, USB devices. It has /home and /usr/local on NFS, sometimes the NFS server responds very slowly. Hmm, we have several machines (same hardware) that freeze, when the X-server is killed with CTRL-ALT-Backspace. We have to powercycle them. Unfortunately, it is not reproducible. I hope the new Xorg update helps for this issue. Can you provide the output of 'hwinfo' attached to this bug? (In reply to comment #1) > Created an attachment (id=113411) [edit] > hwinfo already done Does the new Xorg update help as you hope in comment #4? Looks good until now. Perfect! ;-) Too early :-( One of the machines was completely frozen again. Nothing in the logs. User says they tried bzflag, and then it was frozen. Unfortunately I can't find it with "lastcomm". Obviously "lastcomm" only logs finished commands. How can I log all commands? Different machine, similar Bug: Feb 1 10:24:29 wpyc007 kernel: BUG: warning at fs/inotify.c:171/set_dentry_child_flags() Feb 1 10:24:29 wpyc007 kernel: Feb 1 10:24:29 wpyc007 kernel: Call Trace: Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802d54dc>] set_dentry_child_flags+0x66/0x132 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802d560f>] remove_watch_no_event+0x67/0x76 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff88072fdd>] :reiserfs:reiserfs_delete_inode+0x0/0xf6 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802d5a7d>] inotify_destroy+0x92/0xbf Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802d5b9a>] inotify_release+0x1a/0x73 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff80210559>] __fput+0xae/0x182 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff80221b2a>] filp_close+0x5c/0x64 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff80236bdd>] put_files_struct+0x6c/0xc3 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802131c3>] do_exit+0x2b0/0x8fc Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802450b9>] cpuset_exit+0x0/0x6c Feb 1 10:24:29 wpyc007 kernel: [<ffffffff802295f6>] get_signal_to_deliver+0x46e/0x49d Feb 1 10:24:29 wpyc007 kernel: [<ffffffff80227fdd>] do_signal+0x55/0x74a Feb 1 10:24:29 wpyc007 kernel: [<ffffffff80229cab>] sys_recvfrom+0x11d/0x137 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff80258097>] sysret_signal+0x1c/0x27 Feb 1 10:24:29 wpyc007 kernel: [<ffffffff8025831b>] ptregscall_common+0x67/0xac Feb 1 10:24:29 wpyc007 kernel: Machine from comment #11 is still working correctly without reboot. Maybe the lockups and this bug are two different things. I've been investigating the inotify problem - actually, it does not seem to be rare (I've found several bugreports with the similar warning). But no one else complains about the hang - so that one is probably unrelated. I'm reassigning to Nick who is trying to track down the inotify problem in the mainline. He may be glad for further debugging input ;) Sorry, still working on this in the upstream kernel. Andreas: I'm pretty sure it is harmless. Actually the flag is only used to indicate whether there is an inotify watch on the parent directory without taking a lock. The warning just means we've found the flag set when it should not have been, so we'll just have been doing a bit of extra locking in that case. A similar warning in my system: BUG: warning at fs/inotify.c:181/set_dentry_child_flags() [<c01872af>] set_dentry_child_flags+0xcf/0x11e [<c0187351>] remove_watch_no_event+0x53/0x5f [<c0187a68>] inotify_destroy+0x77/0x9f [<c0187b52>] inotify_release+0xc/0x57 [<c016560f>] __fput+0xac/0x16a [<c0162f2f>] filp_close+0x52/0x59 [<c0121efd>] put_files_struct+0x65/0xa7 [<c0122f34>] do_exit+0x224/0x791 [<c02a6ed5>] do_page_fault+0x27d/0x507 [<c0123517>] sys_exit_group+0x0/0xd [<c0103d5d>] sysenter_past_esp+0x56/0x79 , reiserfs filesystem, no crash, no problem as far as I'm aware, occurred only once on linux 2.6.18.8-0.5-default. *** Bug 308585 has been marked as a duplicate of this bug. *** *** Bug 309752 has been marked as a duplicate of this bug. *** OK, I have taken another look at this problem (sorry it has taken so long). And come up with one fix to close a real race. Another patch to remove the debugging code -- which actually wasn't so helpful to track down any problem (the race was found by inspection) -- and is itself a bit racy. Posted it to linux-fsdevel for public review, and we will go with that solution if no objections are raised in the meantime. Thanks, Nick *** Bug 352290 has been marked as a duplicate of this bug. *** I have patches in -mm for this for a few releases. No problems so far. http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/broken-out/inotify-fix-race.patch http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/broken-out/inotify-remove-debug-code.patch I'm wondering whether I should put these into the OpenSUSE kernel, or wait for them to go upstream first? I suggest to submit this to kernel CVS *HEAD* so that it gets testing in factory - and then move to the 10.3 kernel. I also suggest to push for upstream inclusion. Thanks! Closing this as wontfix. The warnings are rather rare and they are false positives by all accounts anyway. KDE4 actually triggers them more often we found, however I have fixed the problem in recent kernels so 10.3 is probably OK to stay unpatched. |