Bug 369024

Summary: martinu.suse.de - kernel BUG at fs/dcache.c:652!
Product: [SUSE Linux Enterprise Real Time Extension] SUSE Linux Enterprise Real Time 10 SP2 (SLERT 10 SP2) Reporter: Daniel Gollub <dgollub>
Component: kernelAssignee: Peter Morreale <pmorreale>
Status: RESOLVED WONTFIX QA Contact: Erik Hamera <erik.hamera>
Severity: Critical    
Priority: P5 - None CC: ihno, mistinie
Version: BETA4   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: screenlog of martinu.suse.de running 2.6.22.19-0.5-rt (SLERT10 SP2 Beta4) with 'noapic'
commit a50f7951a31d3b976e829250853f89c9d2da32c0
commit 6f23e3872cff238589f9bf39c71db2ea880c9a26

Description Daniel Gollub 2008-03-11 09:22:20 UTC
Host: martinu.suse.de
Kernel: 2.6.22.19-0.5-rt (SLERT10SP2 Beta4) and 'noapic'

While trying to reproduce an completely different issue (https://bugzilla.novell.com/show_bug.cgi?id=368657#c3) this appeared.

---

kernel BUG at fs/dcache.c:652!
invalid opcode: 0000 [1] PREEMPT SMP
last sysfs file: /devices/system/cpu/cpu0/cache/index2/shared_cpu_map
CPU 2
Modules linked in: ext3 jbd hfs vfat fat joydev st sr_mod ext2 mbcache nfs lockd nfs_acl sunrpc autofs4 af_packet ipv6 button battery ac apparmor loop dm_mod i2c_nforce2 ehci_hcd ohci1394 ohci_hcd usbcore rtc_cmos ide_cd cdrom rtc_core rtc_lib ieee1394 i2c_core forcedeth reiserfs pata_amd edd fan thermal processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core
Pid: 22047, comm: umount Tainted: G      N 2.6.22.19-0.5-rt #1
RIP: 0010:[<ffffffff802c1861>]  [<ffffffff802c1861>] shrink_dcache_for_umount_subtree+0x2b1/0x2c0
RSP: 0018:ffff810165527e08  EFLAGS: 00010296
RAX: 000000000000005a RBX: ffff81010efb9570 RCX: ffffffff805398d8
RDX: ffff810156605810 RSI: 0000000000000001 RDI: ffffffff805398a0
RBP: ffff81010efb9570 R08: 00004514222ace71 R09: 0000000000000000
R10: ffff81010004dc60 R11: 0000000000000001 R12: ffff810166a67400
R13: 0000000000000029 R14: 0000000000000000 R15: ffff810178324000
FS:  00002ac2162d36d0(0000) GS:ffff81017c8be640(0000) knlGS:00000000080c1830
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002af10cf36940 CR3: 0000000110ce1000 CR4: 00000000000006e0
Process umount (pid: 22047, threadinfo ffff810165526000, task ffff810156605810)
Stack:  ffff8101783242e0 ffff810178324000 ffffffff88369120 000000000050ef40
 ffff8101231f02c0 ffffffff802c24cd ffff810178324000 ffffffff802af5a9
 ffff810178324000 000000000000001d ffffffff883812a0 ffffffff802af6c9
Call Trace:
 [<ffffffff802c24cd>] shrink_dcache_for_umount+0x2d/0x40
 [<ffffffff802af5a9>] generic_shutdown_super+0x19/0x110
 [<ffffffff802af6c9>] kill_anon_super+0x9/0x40
 [<ffffffff88347c5d>] :nfs:nfs_kill_super+0xd/0x20
 [<ffffffff802af7af>] deactivate_super+0x8f/0xb0
 [<ffffffff802c74db>] sys_umount+0x6b/0x2f0
 [<ffffffff802c09f5>] d_kill+0x55/0x80
 [<ffffffff802c0a41>] dput+0x21/0x140
 [<ffffffff802c6d3f>] mntput_no_expire+0x1f/0xa0
 [<ffffffff802ab3f4>] filp_close+0x54/0x90
 [<ffffffff8020a04e>] system_call+0x7e/0x83


Code: 0f 0b eb fe 0f 0b eb fe 66 66 66 90 66 66 90 48 83 ec 38 48
RIP  [<ffffffff802c1861>] shrink_dcache_for_umount_subtree+0x2b1/0x2c0
 RSP <ffff810165527e08>
Comment 1 Daniel Gollub 2008-03-11 09:28:06 UTC
Created attachment 200131 [details]
screenlog of martinu.suse.de running 2.6.22.19-0.5-rt (SLERT10 SP2 Beta4) with 'noapic'
Comment 2 Daniel Gollub 2008-03-11 09:36:03 UTC
Very likely a duplicate to bug #293351
Comment 3 Daniel Gollub 2008-03-11 12:00:30 UTC
(In reply to comment #2 from Daniel Gollub)
> Very likely a duplicate to bug #293351
Ignore this bug - this was related to reiserfs.
Comment 4 Peter Morreale 2008-03-11 18:55:24 UTC
(In reply to comment #3 from Daniel Gollub)
> (In reply to comment #2 from Daniel Gollub)
> > Very likely a duplicate to bug #293351
> Ignore this bug - this was related to reiserfs.
> 

Danial,
THis was fixed?  Note that 293351 was reported last July.  Are we sure that this patch is in the rt tree?  which patch was it?
thx
Comment 5 Daniel Gollub 2008-03-11 21:21:02 UTC
I don't want to lead you to the wrong track, since the other bug looks like it's only related to reiserfs. Maybe Jan can give a quick comment on this.


I check if this patch is in the RT tree and try in meanwhile to reproduce the issue.
Comment 7 Sven Dietrich 2008-03-12 17:43:32 UTC
I will review
Comment 8 Peter Morreale 2008-03-13 17:22:30 UTC
k, reviewed all this.  We should definitely try the suggested NFS patch.  This would most likely need to get into other kernels as well. (if not already there)

A possible test case would be to unlink a bunch of files via the client, then umount the NFS dir from the client.  

Please try the above test prior to patching the kernel, it should generate the Oops.  Make sure the client machine is "cpu busy" at the time you do the unlinks and umount. 



 
Comment 9 Daniel Gollub 2008-03-14 11:17:02 UTC
Peter, thanks for reviewing. Your suggestion is quite similar to the inital mail on LKML about this patch.
http://lkml.org/lkml/2007/11/3/34

First i try to reproduce this issue reliable, and then testing with patched kernel.
Comment 12 Sven Dietrich 2008-03-19 03:57:34 UTC
Created attachment 202834 [details]
commit a50f7951a31d3b976e829250853f89c9d2da32c0
Comment 13 Sven Dietrich 2008-03-19 03:58:31 UTC
Created attachment 202835 [details]
commit 6f23e3872cff238589f9bf39c71db2ea880c9a26
Comment 14 Sven Dietrich 2008-03-19 04:01:27 UTC
(In reply to comment #6 from Daniel Gollub)
> Just found this:
> http://bugzilla.kernel.org/show_bug.cgi?id=9710
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef818a28fac9bd214e676986d8301db0582b92a9
> 

This patch requires some serious modification, or, a major back-port of NFSv4 fixes, which I have nearly completed in the process.

However, In doing so, I think the two patches that I have just attached, may actually resolve the issue without requiring the major back-port.

Nevertheless, the back-port should be considered, since there are apparently a large number of flaws in the 2.6.22 NFS base.
Comment 15 Sven Dietrich 2008-03-19 04:09:33 UTC
Pete, could you please take a look?
Comment 16 Peter Morreale 2008-03-26 21:12:48 UTC
(In reply to comment #12 from Sven Dietrich)
> Created an attachment (id=202834) [details]
> commit a50f7951a31d3b976e829250853f89c9d2da32c0
> 

This patch is already applied.
Comment 17 Peter Morreale 2008-03-26 21:13:49 UTC
(In reply to comment #13 from Sven Dietrich)
> Created an attachment (id=202835) [details]
> commit 6f23e3872cff238589f9bf39c71db2ea880c9a26
> 

THis patch is also already applied.
Comment 18 Peter Morreale 2008-03-26 21:19:55 UTC
(In reply to comment #14 from Sven Dietrich)
> (In reply to comment #6 from Daniel Gollub)
> > Just found this:
> > http://bugzilla.kernel.org/show_bug.cgi?id=9710
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef818a28fac9bd214e676986d8301db0582b92a9
> > 
> 
> This patch requires some serious modification, or, a major back-port of NFSv4
> fixes, which I have nearly completed in the process.
> 
> However, In doing so, I think the two patches that I have just attached, may
> actually resolve the issue without requiring the major back-port.
> 
> Nevertheless, the back-port should be considered, since there are apparently a
> large number of flaws in the 2.6.22 NFS base.
> 

Confused...  These patches do not seem so intrusive.  It appears they are merely adding a *put_super with a wait mechanism to allow the async ops to complete. 

I'm also confused since Viro suggest merely holding the super block (inc ref count) until the async ops complete.  Clearly this is cleaner, however Trond signed off on this patchset.  

I need to review this some more.
Comment 23 Ihno Krumreich 2008-05-20 15:17:49 UTC
This will not get fixed for SP2.
Comment 24 Bugzilla Account Maintenance 2008-09-02 18:07:50 UTC
Because the LATER and REMIND resolutions have been removed, the resolution of this bug has changed from LATER to WONTFIX. If this bug needs to be reconsidered, reopen it and set a future "Target Milestone for Fix."