Bugzilla – Bug 578646
NFS gets disrupted when transfering files
Last modified: 2010-09-29 14:43:38 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.1.7) Gecko/20091222 SUSE/3.5.7-1.1.1 Firefox/3.5.7 Nearly every time, i copy a file on my Asus eeePC 1005HA over NFS, i get this message in /var/log/messages kernel: RPC: multiple fragments per record not supported After this message no NFS transfer is possible until i restart the computer. Kernel is kernel-default 2.6.31.12 Reproducible: Always Steps to Reproduce: 1. 2. 3.
I ran into something similar, where NFS would hang, but I updated recently to 11.3 M1 and lost the logs.
NFS is a packet oriented protocol. When these packets are sent over a TCP connection (which is stream oriented) each packets is prefixed with a short header which gives the length of the packet. The header contains a flag which says that the current packet is only part of the NFS packet and the receiver should gather packets until it recieves one with the flag clear. No known NFS client ever sets this flag. They all send the whole NFS request in a single RPC packet. The message you are getting "... multiple fragments per record not supported" means that the NFS server has received an RPC packet which has this bit set. The most likely explanation for this is that the stream has been corrupted some how and the bytes that are being interpreted as a header with the flag set are meant to be something else entirely. There was a client bug some time ago that could cause that, but I believe it has been fixed. I assume that the kernel version you game (2.6.31.12) is the kernel that is running on the NFS server. Please report also the OS and Kernel version that is running on your eeePC. Also if it is possible to get a tcpdump trace when the error occurs that could be helpful. On the server: tcpdump -s 0 -w /tmp/tcpdump host address-off-client and let that run while you copy a file on your eeepc.
Thank you for helping. I have really big problems because of this bug, because i manage nearly everything over NFS. No, kernel 2.6.31.12 is on the client (eeePC). I thought that the error should be in the client, because 5 other clients work with this NFS server. The server has kernel 2.6.27.42. I first started the command: # tcpdump -s 0 -w /tmp/tcpdump host 192.168.0.44 | tee /home/user/Documents/Tmp/tcpdump.log Then i copied 3 files and after the second file i had again the error kernel: RPC: fragment too large: 0x3f525d39 in the server logs. But the file /home/user/Documents/Tmp/tcpdump.log only shows this entrys: tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 0 packets captured 1 packets received by filter 0 packets dropped by kernel Is there something wrong with the command? Thanks!
The command looks right, however if the host that you ran the command on has multiple network interfaces, and if the NFS traffic would go over an interface other than 'eth0', then it won't have collected anything useful. The captured traffic goes in the file '/tmp/tcpdump', you wouldn't get much going to stdout, but you would expect a larger number than '0 packets captured' Maybe you need to specify the interface with "-i" e.g. tcpdump -s 0 -i wlan0 -w /tmp/tcpdump host 192.168.0.44
> Maybe you need to specify the interface with "-i" My fault :-/ Thanks, it was the false interface. Now it worked. I copied one big file (init 3 with cp) and suddenly the error appeared and NFS was not working any more: http://dl.dropbox.com/u/2393448/tcpdump I also think that i made the following not clear: NFS is only corrupted on the client. Other clients can access the NFS on the server without problems. Also i wanted to ask: Can it be that this is a hardware problem of my clients network card?
Thanks for the tcpdump trace - it is very helpful. Everything looks fine up to packet 21. Then it goes horribly wrong. Packet 21 should be an NFS write request, or at least the beginning of one. It appear that the 'wsize' is 128K so the whole write request would be slightly more than 128K in length, so severl packets. The first 0x42 bytes of packet 21 are the TCP/IP headers exactly as you would expect. After that should come the RPC header, then NFS header, then WRITE data. However instead, the second 0x42 bytes are an exact duplicate of the first 0x42 bytes. After that I can see the correct RPC header - only it is at the wrong place. So something is duplicating the IP/RPC headers. I think it is very likely that this is related to the particular network card, either a hardware fault in the card or an error in the driver. Also, I think is very likely to be related to some aspected of 'offload'. Probably TCP segmentation offload. Could you please use "ethtool --show-offload" to see what offload features are enabled, then use e.g. "ethtool --offload tso off" to disable any offload features and then see if the error recurs. If that does remove the NFS errors, then you can either accept that as a work-around, or refill this bug against the driver for the particular hardware.
I hope i did not again something stupid, but i got no result with the command: # ethtool --show-offload Offload parameters for --show-offload: Cannot get device rx csum settings: No such device Cannot get device tx csum settings: No such device Cannot get device scatter-gather settings: No such device Cannot get device tcp segmentation offload settings: No such device Cannot get device udp large send offload settings: No such device Cannot get device generic segmentation offload settings: No such device Cannot get device flags: No such device I did this on the client. Is this correct? To be sure i tried also on the server... but the same: # ethtool --show-offload Offload parameters for --show-offload: Cannot get device rx csum settings: No such device Cannot get device tx csum settings: No such device Cannot get device scatter-gather settings: No such device Cannot get device tcp segmentation offload settings: No such device Cannot get device udp large send offload settings: No such device Cannot get device generic segmentation offload settings: No such device Cannot get device flags: No such device no offload info available
"man ethtool" for the exact usage. I think you need to give the name of the interface. e.g. ethtool --show-offload eth0
Arg, ok, thanks. Here is the output from the client: # ethtool --show-offload eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device flags: Operation not supported rx-checksumming: off tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: on large receive offload: off What should i deactivate from this options and how? Thanks for your help!
First try turning everything off that is on. ethtool --offload eth0 tx off sg off tso off gso off lro off Then if that fixes the problem, you might like to try turning them back on one by won until the problem returns.
Thank you, but never ending story :-/ For every setting i get this answer: # ethtool --offload eth0 tso off Cannot set device tcp segmentation offload settings: Operation not supported
I think we've just about exhausted my expertise here. I'll need to find someone who knows about network cards. Please report details of your network hardware. e.g. ethtool -i eth2 lspci then we'll try to re-assign to someone who knows about that stuff.
Thank you very much that you tried everything. This is a big problem for me, because in the moment this computer is useless for me. I thought that it would be a good idea to check if it is really eth0 which makes the problem. So i copied a big file over wlan0 and all worked. So the problem should really be eth0. # ethtool -i eth0 driver: atl1c version: 1.0.0.1-NAPI firmware-version: N/A bus-info: 0000:01:00.0 # lspci 01:00.0 Ethernet controller: Attansic Technology Corp. Atheros AR8132 / L1c Gigabit Ethernet Adapter (rev c0)
I am reassigning this bug to the default assignee for 'kernel' so it can be reassigned to someone who knows about the driver for Atheros Gigabit Ethernet Adapter (see previous comment). There is strong evidence that this controller is sending NFS/TCP packets badly, possible the tso is not working correctly.
Did something went wrong with reassigning the bug? I ask, because i have really big problems because of this.
Can you try the kernel of the day to see if the issue is fixed upstream already? 1) Add the and enable the following URL to your repository list using Yast: http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_Factory/ 2) Install the latest kernel-default package. Can you also attach the output of hwinfo?
Cool, thank you. With 2.6.33-29-default it works! If you need it, here hwinfo: 60: None 00.0: 10701 Ethernet [Created at net.124] Unique ID: usDW.ndpeucax6V1 Parent ID: rBUF.Qk9ZRmN_Ab8 SysFS ID: /class/net/eth0 SysFS Device Link: /devices/pci0000:00/0000:00:1c.3/0000:01:00.0 Hardware Class: network interface Model: "Ethernet network interface" Driver: "atl1c" Driver Modules: "atl1c" Device File: eth0 HW Address: 90:e6:ba:6b:28:ed Link detected: yes Config Status: cfg=no, avail=yes, need=no, active=unknown Attached to: #28 (Ethernet controller) Perhaps if other switch to this kernel: After update i was not able to load eeepc_laptop. To do this you must give the option acpi_osi=Linux in Grub on startup.
I am guessing this patch is what fixed the issue: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=678b77e265f6d66f1e68f3d095841c44ba5ab112 I am building up a Kernel for you to test that includes this fix. I will post the RPMs once they are built. Thanks for testing.
Can you please test the kernel-default rpm for your platform: http://beta.suse.com/private/bphilips//578646/
Can you help me? How can i install the package. My way didn't work: # rpm -i kernel-default-2.6.31.12-bnc578646.0.i586.rpm package kernel-default-2.6.33-29.1.i586 (which is newer than kernel-default-2.6.31.12-bnc578646.0.i586) is already installed
Hi Brandon, One of our issues,https://bugzilla.novell.com/show_bug.cgi?id=589071 needs a similar fix.So instead of Open SUSE,can we get SLES 11 kernel style package which would include this fix.
(In reply to comment #20) > Can you help me? How can i install the package. My way didn't work: > > # rpm -i kernel-default-2.6.31.12-bnc578646.0.i586.rpm > package kernel-default-2.6.33-29.1.i586 (which is newer than > kernel-default-2.6.31.12-bnc578646.0.i586) is already installed rpm -i --force kernel-default-2.6.31.12-bnc578646.0.i586.rpm Sorry for the delay. Missed this message.
Closing this bug as NORESPONSE as Gruber didn't test the Kernel in Comment #22.