Bugzilla – Bug 369024
martinu.suse.de - kernel BUG at fs/dcache.c:652!
Last modified: 2009-03-09 20:07:43 UTC
Host: martinu.suse.de Kernel: 2.6.22.19-0.5-rt (SLERT10SP2 Beta4) and 'noapic' While trying to reproduce an completely different issue (https://bugzilla.novell.com/show_bug.cgi?id=368657#c3) this appeared. --- kernel BUG at fs/dcache.c:652! invalid opcode: 0000 [1] PREEMPT SMP last sysfs file: /devices/system/cpu/cpu0/cache/index2/shared_cpu_map CPU 2 Modules linked in: ext3 jbd hfs vfat fat joydev st sr_mod ext2 mbcache nfs lockd nfs_acl sunrpc autofs4 af_packet ipv6 button battery ac apparmor loop dm_mod i2c_nforce2 ehci_hcd ohci1394 ohci_hcd usbcore rtc_cmos ide_cd cdrom rtc_core rtc_lib ieee1394 i2c_core forcedeth reiserfs pata_amd edd fan thermal processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core Pid: 22047, comm: umount Tainted: G N 2.6.22.19-0.5-rt #1 RIP: 0010:[<ffffffff802c1861>] [<ffffffff802c1861>] shrink_dcache_for_umount_subtree+0x2b1/0x2c0 RSP: 0018:ffff810165527e08 EFLAGS: 00010296 RAX: 000000000000005a RBX: ffff81010efb9570 RCX: ffffffff805398d8 RDX: ffff810156605810 RSI: 0000000000000001 RDI: ffffffff805398a0 RBP: ffff81010efb9570 R08: 00004514222ace71 R09: 0000000000000000 R10: ffff81010004dc60 R11: 0000000000000001 R12: ffff810166a67400 R13: 0000000000000029 R14: 0000000000000000 R15: ffff810178324000 FS: 00002ac2162d36d0(0000) GS:ffff81017c8be640(0000) knlGS:00000000080c1830 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002af10cf36940 CR3: 0000000110ce1000 CR4: 00000000000006e0 Process umount (pid: 22047, threadinfo ffff810165526000, task ffff810156605810) Stack: ffff8101783242e0 ffff810178324000 ffffffff88369120 000000000050ef40 ffff8101231f02c0 ffffffff802c24cd ffff810178324000 ffffffff802af5a9 ffff810178324000 000000000000001d ffffffff883812a0 ffffffff802af6c9 Call Trace: [<ffffffff802c24cd>] shrink_dcache_for_umount+0x2d/0x40 [<ffffffff802af5a9>] generic_shutdown_super+0x19/0x110 [<ffffffff802af6c9>] kill_anon_super+0x9/0x40 [<ffffffff88347c5d>] :nfs:nfs_kill_super+0xd/0x20 [<ffffffff802af7af>] deactivate_super+0x8f/0xb0 [<ffffffff802c74db>] sys_umount+0x6b/0x2f0 [<ffffffff802c09f5>] d_kill+0x55/0x80 [<ffffffff802c0a41>] dput+0x21/0x140 [<ffffffff802c6d3f>] mntput_no_expire+0x1f/0xa0 [<ffffffff802ab3f4>] filp_close+0x54/0x90 [<ffffffff8020a04e>] system_call+0x7e/0x83 Code: 0f 0b eb fe 0f 0b eb fe 66 66 66 90 66 66 90 48 83 ec 38 48 RIP [<ffffffff802c1861>] shrink_dcache_for_umount_subtree+0x2b1/0x2c0 RSP <ffff810165527e08>
Created attachment 200131 [details] screenlog of martinu.suse.de running 2.6.22.19-0.5-rt (SLERT10 SP2 Beta4) with 'noapic'
Very likely a duplicate to bug #293351
(In reply to comment #2 from Daniel Gollub) > Very likely a duplicate to bug #293351 Ignore this bug - this was related to reiserfs.
(In reply to comment #3 from Daniel Gollub) > (In reply to comment #2 from Daniel Gollub) > > Very likely a duplicate to bug #293351 > Ignore this bug - this was related to reiserfs. > Danial, THis was fixed? Note that 293351 was reported last July. Are we sure that this patch is in the rt tree? which patch was it? thx
I don't want to lead you to the wrong track, since the other bug looks like it's only related to reiserfs. Maybe Jan can give a quick comment on this. I check if this patch is in the RT tree and try in meanwhile to reproduce the issue.
Just found this: http://bugzilla.kernel.org/show_bug.cgi?id=9710 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef818a28fac9bd214e676986d8301db0582b92a9
I will review
k, reviewed all this. We should definitely try the suggested NFS patch. This would most likely need to get into other kernels as well. (if not already there) A possible test case would be to unlink a bunch of files via the client, then umount the NFS dir from the client. Please try the above test prior to patching the kernel, it should generate the Oops. Make sure the client machine is "cpu busy" at the time you do the unlinks and umount.
Peter, thanks for reviewing. Your suggestion is quite similar to the inital mail on LKML about this patch. http://lkml.org/lkml/2007/11/3/34 First i try to reproduce this issue reliable, and then testing with patched kernel.
Created attachment 202834 [details] commit a50f7951a31d3b976e829250853f89c9d2da32c0
Created attachment 202835 [details] commit 6f23e3872cff238589f9bf39c71db2ea880c9a26
(In reply to comment #6 from Daniel Gollub) > Just found this: > http://bugzilla.kernel.org/show_bug.cgi?id=9710 > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef818a28fac9bd214e676986d8301db0582b92a9 > This patch requires some serious modification, or, a major back-port of NFSv4 fixes, which I have nearly completed in the process. However, In doing so, I think the two patches that I have just attached, may actually resolve the issue without requiring the major back-port. Nevertheless, the back-port should be considered, since there are apparently a large number of flaws in the 2.6.22 NFS base.
Pete, could you please take a look?
(In reply to comment #12 from Sven Dietrich) > Created an attachment (id=202834) [details] > commit a50f7951a31d3b976e829250853f89c9d2da32c0 > This patch is already applied.
(In reply to comment #13 from Sven Dietrich) > Created an attachment (id=202835) [details] > commit 6f23e3872cff238589f9bf39c71db2ea880c9a26 > THis patch is also already applied.
(In reply to comment #14 from Sven Dietrich) > (In reply to comment #6 from Daniel Gollub) > > Just found this: > > http://bugzilla.kernel.org/show_bug.cgi?id=9710 > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef818a28fac9bd214e676986d8301db0582b92a9 > > > > This patch requires some serious modification, or, a major back-port of NFSv4 > fixes, which I have nearly completed in the process. > > However, In doing so, I think the two patches that I have just attached, may > actually resolve the issue without requiring the major back-port. > > Nevertheless, the back-port should be considered, since there are apparently a > large number of flaws in the 2.6.22 NFS base. > Confused... These patches do not seem so intrusive. It appears they are merely adding a *put_super with a wait mechanism to allow the async ops to complete. I'm also confused since Viro suggest merely holding the super block (inc ref count) until the async ops complete. Clearly this is cleaner, however Trond signed off on this patchset. I need to review this some more.
This will not get fixed for SP2.
Because the LATER and REMIND resolutions have been removed, the resolution of this bug has changed from LATER to WONTFIX. If this bug needs to be reconsidered, reopen it and set a future "Target Milestone for Fix."