Bug 707765 - NFS I/O leads to CPU soft lockup and eventual hard lockup
Summary: NFS I/O leads to CPU soft lockup and eventual hard lockup
Status: RESOLVED WONTFIX
Alias: None
Product: openSUSE 11.3
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: x86-64 openSUSE 11.3
: P5 - None : Major with 3 votes (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-22 19:06 UTC by Jeffrey Katcher
Modified: 2012-03-05 23:29 UTC (History)
0 users

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeffrey Katcher 2011-07-22 19:06:24 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 Firefox/5.0

I'd reported something similar to this as Bug 689414, providing a pointer to a Redhat/Fedora patch which was incorporated into OpenSuse 11.[34] as the fix.

Unfortunately this does not seem to fix the problem (similar hangs repeatedly occur on patched 11.3 systems).  I notice that a flood of Fedora bug fixes has been made by the NFS committers, especially Trond Myklebust.  The long and winding road starts at their bug 672305:
https://bugzilla.redhat.com/show_bug.cgi?id=672305

There are several fixes mentioned here, but this also chains to their 692315.

If the OpenSuse NFS person could please take a look at this and see if it's been committed yet and if not, please consider doing so.

I'm not sure if this should be critical.  It doesn't occur that often, but when it does, there's evidence that it can lock up the connected file server too. 

Reproducible: Sometimes

Steps to Reproduce:
1.
2.
3.
Comment 1 Greg Kroah-Hartman 2011-08-31 21:06:01 UTC
Is this still an issue with 11.4?
Comment 2 Jeffrey Katcher 2011-08-31 21:55:19 UTC
We haven't upgraded our computer servers/cluster to 11.4.  I have an 11.4 desktop but that's about it.  The previous bug fix 689414 definitely did not fix 11.3 but I'm waiting for the next reboot to pick up all 11.3 cumulative fixes (as of 2.6.34.10-0.2) on the most prevalent machines.
Comment 3 Jeffrey Katcher 2011-09-07 17:23:13 UTC
The previous fix is not sufficient as we're seeing the same failures (cited below) in a fully patched 11.3.

A search of the kernel.org Bugzilla also reveals:
https://bugzilla.kernel.org/show_bug.cgi?id=16494

The fix for this is a superset of the 689414 fix and appears to be in 11.4 but not 11.3.

Is there any sort of status update on the integration of these fixes, especially into 11.3?

947673.797966] Call Trace:
[947673.797972]  [<ffffffff8145cce7>] out_of_line_wait_on_bit_lock+0x77/0x90
[947673.797983]  [<ffffffffa0341919>] nfs_commit_inode+0xb9/0x2a0 [nfs]
[947673.798002]  [<ffffffffa0341d05>] nfs_wb_page+0x65/0xd0 [nfs]
[947673.798016]  [<ffffffff810ec4c0>] invalidate_inode_pages2_range+0x1e0/0x2d0
[947673.798021]  [<ffffffffa0334118>] nfs_revalidate_mapping+0x118/0x170 [nfs]
[947673.798029]  [<ffffffffa03311f3>] nfs_file_read+0x73/0x130 [nfs]
[947673.798034]  [<ffffffff8112f15f>] do_sync_read+0xbf/0x100
[947673.798037]  [<ffffffff8112f943>] vfs_read+0xb3/0x190
[947673.798039]  [<ffffffff8112fa6e>] sys_read+0x4e/0x90
[947673.798043]  [<ffffffff81002efb>] system_call_fastpath+0x16/0x1b
[947673.798046]  [<00007fd9e5e0b4d0>] 0x7fd9e5e0b4d0
[947673.798046] Code: 89 f3 48 83 ec 10 eb 0c 66 90 f0 0f ab 07 19 c0 85 c0 74 36 48 89 ef 44 89 ea 4c 89 e6 e8 78 5c c1 ff 8b 43 08 48 8b 3b 0f a3 07 <19> d2 85 d2 74 d9 41 ff d6 85 c0 75 32 48 8b 3b 8b 43 08 eb ca
Comment 4 Jeff Mahoney 2012-03-05 23:29:45 UTC
openSUSE 11.3 is out of maintenance. If you're able to reproduce this issue with 11.4 or 12.1, please re-open with an updated Product field.