Bug 751691

Summary: NFS Server stops when deliver large files greater than 35 MB
Product: [openSUSE] openSUSE 12.1 Reporter: andreas krumnow <andreas>
Component: BasesystemAssignee: Will Stephenson <wstephenson>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: andreas, info, Mathias.Homann
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 12.1   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description andreas krumnow 2012-03-12 01:01:28 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2

All used machines running opensuse 12.1. Clients 64 bit. Server 32 bit.
When I try to copy or stream a large file greater than 35 MB, from an nfs-mount to my local harddisk, the transfer stops after 35 MB and on the server machine I can see 3 or 4 nfsd processes whith 99% cpu-usage all together.

I've stopped that after 4 Minutes with ^C on client and "rcnfsserver restart" on server.

I've tested with NFSv4 and v3, sync and async option in /etc/exports and fstab.

Please let me know, which informations I should provide, to solve this problem.

Reproducible: Always

Steps to Reproduce:
1.on server: nfs-export a directory with one or more large files in it. Can be any structure

2. on client mount the nfs-directory anywhere. Have enough local diskspace, to store the large file.

3. on client: copy a large file from nfs-mount to localdisk.
Actual Results:  
it transfers and stores max. ~ 35 Megabyte on localdisk. 
Client waits for more Data and freeze (on KDE only the terminal or the filemanager).
Server uses his complete CPU with 3 -4 nfsd processes

Expected Results:  
should copy the data with full network speed in 1 minute.

I've found no relevant entries in /var/log/messages, targeting nfs errors.

A test, doing the same copy with "scp user@server:/dir/file ." on the commandline was successfully done - 1.5 GB on full speed - so the network is ok.
There must have something changed in the NFS-Server since OS 11.3, where it works with the 12.1 clients accessing the 11.3-server.

Curious: the server has 2 NICs, eth0 and eth2. It is connected via eth0. But when I run nfswatch without parameters, it displays the stats of eth2.
Comment 1 andreas krumnow 2012-03-13 16:33:18 UTC
Maybe some more network settings info will help ...

------------
On server:

:~> uname -r
3.1.9-1.4-default

:~> cat /etc/hosts
# special IPv6 addresses
127.0.0.1       localhost
127.0.0.1       localhost.athome.local
192.168.2.43    ws1.athome.local ws1
192.168.2.44    srv1.athome.local srv1
192.168.2.44    srvalias.athome.local srvalias

:~> cat /etc/exports |grep srv
/srv/stuff/    192.168.2.0/255.255.255.0(rw,insecure,no_subtree_check,async,anongid=100,anonuid=1000)
...

------------
On client: 

:~> uname -r
3.1.9-1.4-desktop

:~> cat /etc/hosts
127.0.0.1       localhost
::1             localhost ipv6-localhost ipv6-loopback
fe00::0         ipv6-localnet
ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
192.168.2.43    ws1.athome.local ws1
192.168.2.44    srv1.athome.local srv1
192.168.2.44    srvalias.athome.local srvalias

:~> cat /etc/fstab |grep srv
srv1:/srv/stuff /mnt/nfs-srv    nfs     noauto,rw,hard,intr,user,async,noexec,_netdev 0 0 

------------

I'm not using IPv6 in my network. Although it is listed in hosts by default.
As I said above, I've tested also with NFSv4, setted via YaST. All other Options were the same.
Comment 2 Andrei Volkov 2012-05-16 10:54:31 UTC
I have the same trouble:
OpenSuSE 12.1

localhost:~ # uname -a
Linux localhost 3.3.0-0-default #1 SMP Fri Mar 30 15:21:05 FET 2012 (028c29f) i686 i686 i386 GNU/Linux
localhost:~ # rpm -qa|grep nfs
nfsidmap-0.24-12.1.2.i586
yast2-nfs-common-2.21.1-2.1.1.noarch
quota-nfs-4.00-18.1.3.i586
yast2-nfs-client-2.21.1-2.1.1.noarch
limal-nfs-server-1.6.3-2.1.3.i586
nfs-client-1.2.5-4.3.1.i586
nfs-kernel-server-1.2.5-4.3.1.i586
limal-nfs-server-perl-1.6.3-2.1.3.i586
localhost:~ #

Clients:
Centos 6.2
Debian Wheezy
Comment 3 Mathias Homann 2012-08-27 07:59:25 UTC
is this reproducable with 12.2?
Comment 4 Mathias Homann 2012-08-27 08:01:28 UTC
is this still reproducable if you use the latest official kernel update for 12.1?


lemmy@sai:~> uname -a
Linux sai 3.1.10-1.16-desktop #1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078) x86_64 x86_64 x86_64 GNU/Linux

lemmy@sai:~> rpm -qfi /boot/vmlinuz-3.1.10-1.16-desktop
Name        : kernel-desktop
Version     : 3.1.10
Release     : 1.16.1
Architecture: x86_64
Install Date: Di 03 Jul 2012 18:44:16 CEST
Group       : System/Kernel
Size        : 151209308
License     : GPL-2.0
Signature   : RSA/SHA256, Di 03 Jul 2012 13:27:47 CEST, Key ID b88b2fd43dbdc284
Source RPM  : kernel-desktop-3.1.10-1.16.1.nosrc.rpm
Build Date  : Do 28 Jun 2012 00:10:52 CEST
Build Host  : build12
Relocations : (not relocatable)
Packager    : http://bugs.opensuse.org
Vendor      : openSUSE
URL         : http://www.kernel.org/
Summary     : Kernel optimized for the desktop
Description :
This kernel is optimized for the desktop. It is configured for lower latency
and has many of the features that aren't usually used on desktop machines
disabled.












                                                                                                                                                                     
Source Timestamp: 2012-06-27 07:21:40 +0200                                                                                                                          
GIT Revision: d0160785c29429447c2e479c977d5e07dab7b40a                                                                                                               
GIT Branch: openSUSE-12.1                                                                                                                                            
Distribution: openSUSE 12.1                                                                                                                                          
lemmy@sai:~>
Comment 5 andreas krumnow 2012-08-27 17:37:49 UTC
it seems, the bug is fixed. I hadn't tested it since end of April and used Samba-cifs instead, but nfs is more comfortable in my LAN, because the user rights are much easier to handle. 

p43:~ # uname -a
Linux p43 3.1.10-1.16-desktop #1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078) x86_64 x86_64 x86_64 GNU/Linux

Now I've tested it with a 2 GB file via Gigabit NICs and switches and the transfer is done in less than a minute. I've checked the file on both sides with md5sum and the sums are equal. 
All ok, Bugreport can be closed.
Thank you very much for this fix and the hard work to puzzle out the reason.