Bug 965542

Summary: NFS: directory share/man contains a readdir loop. Please contact your server vendor. The file: eo has duplicate cookie 15
Product: [openSUSE] openSUSE Distribution Reporter: Per Jessen <per>
Component: BasesystemAssignee: Neil Brown <nfbrown>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: nfbrown, per, ro
Version: Leap 42.1   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Per Jessen 2016-02-07 19:52:49 UTC
On a fresh install of Leap 42 on an NFS root, I see this message in dmesg output quite a few times.
Comment 1 Per Jessen 2016-02-07 19:54:53 UTC
Is this likely to affect my reading man pages too:

stork2:~ # l /usr/share/man/man8/iscsiuio.8.gz
-rw-r--r-- 1 root root 1015 Oct 25 14:10 /usr/share/man/man8/iscsiuio.8.gz
stork2:~ # man iscsiuio
No manual entry for iscsiuio
stork2:~ # man 8 iscsiuio
No manual entry for iscsiuio in section 8
Comment 2 Per Jessen 2016-02-07 20:08:01 UTC
Something odd's going on:

stork2:/usr/share/man # l
ls: reading directory .: Too many levels of symbolic links
total 4996
drwxr-xr-x 34 root root  4096 Feb  7 14:38 ./
drwxr-xr-x 72 root root  4096 Feb  7 18:41 ../
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 ca/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 cs/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 da/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 de/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 el/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 eo/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Feb  7 14:27 es/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Oct 25 13:47 fr/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x 12 root root    80 Feb  7 14:27 hu/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x  5 root root    24 Feb  7 14:38 id/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 it/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x 12 root root    80 Oct 25 13:47 ja/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root 20480 Feb  7 18:47 man1/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root  8192 Sep 30 12:59 man2/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root 32768 Feb  7 18:45 man3/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Sep 30 12:59 man4/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man5/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root    16 Sep 30 12:59 man6/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
drwxr-xr-x  2 root root  4096 Feb  7 18:47 man7/
Comment 3 Per Jessen 2016-02-08 16:14:25 UTC
This appears to be dependent on whether I use NFSv3 or NFSv4 - with NFSv3 there is no problem.
Comment 4 Neil Brown 2016-02-17 01:03:00 UTC
Is the filesystem on the server ext3 or ext4?

If you use
  tune2fs -O ^dir_index /dev/sdWHATEVER

to turn off dir_index on that filesystem, does the problem go away?

If "yes" to both of those, then you are hitting a design bug in the ext3/4 directory indexing.

Is the server running a 3.3 or older kernel?  If so then an upgrade
will turn the bug from a 32bit-collision possibility into a 64bit collision possibility, which makes it much less likely to hit (but doesn't really solve it).

If the above doesn't seem to explain it, then I would need kernel version on server, filesystem details on server, and a tcpdump trace (-s 0) for the "ls -l" attempt.

The NFSv3/NFSv4 difference is probably because they fit different numbers of entries into a reply.  The problem particularly occurs if the last name in one reply and the first name in the next reply hash to the same value (or something like that).
Comment 5 Per Jessen 2016-02-18 10:19:53 UTC
(In reply to Neil Brown from comment #4)
> Is the filesystem on the server ext3 or ext4?
> 
> If you use
>   tune2fs -O ^dir_index /dev/sdWHATEVER
> 
> to turn off dir_index on that filesystem, does the problem go away?
> 
> If "yes" to both of those, then you are hitting a design bug in the ext3/4
> directory indexing.

Hi Neil

the server is serving from JFS.  ISTR an issue wrt jfs and nfs - I think I have a comment in a pxe config that specifically says "to use nfsv3 because jfs has a problem" - I'll see if I can find it again.

> Is the server running a 3.3 or older kernel?  If so then an upgrade
> will turn the bug from a 32bit-collision possibility into a 64bit collision
> possibility, which makes it much less likely to hit (but doesn't really
> solve it).

The server is running 3.7.10-1.45-default, 32bit. 

> If the above doesn't seem to explain it, then I would need kernel version on
> server, filesystem details on server, and a tcpdump trace (-s 0) for the "ls
> -l" attempt.

I'll be happy to get you the trace, but there is something about the jfs/nfs combination that keeps nagging at me.
Comment 6 Per Jessen 2016-02-18 11:07:55 UTC
(In reply to Per Jessen from comment #5)
> the server is serving from JFS.  ISTR an issue wrt jfs and nfs - I think I
> have a comment in a pxe config that specifically says "to use nfsv3 because
> jfs has a problem" - I'll see if I can find it again.

I have a number of systems with root on NFS, they all use nfsv3 and the backing filesystem is jfs. I can't find the comment though, but I'm pretty certain it goes back 2-3 years and read "use nfsv3 until the jfs/nfs problem is solved in mainline". 

I've googled it (jfs readdir cookie incompatibility NFSv4) quite a bit:

https://bugzilla.kernel.org/show_bug.cgi?id=60737
https://bugzilla.kernel.org/show_bug.cgi?id=94741
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=714974
Comment 7 Neil Brown 2016-02-19 00:16:42 UTC
Looks like the jfs/NFSv4 problem was fixed in 3.11.

Commit: 44512449c0ab ("jfs: fix readdir cookie incompatibility with NFSv4")

So it is a server-side problem.
Are you running openSUSE on the server?
Comment 8 Per Jessen 2016-02-19 10:43:45 UTC
(In reply to Neil Brown from comment #7)
> Looks like the jfs/NFSv4 problem was fixed in 3.11.
> 
> Commit: 44512449c0ab ("jfs: fix readdir cookie incompatibility with NFSv4")
> 
> So it is a server-side problem.
> Are you running openSUSE on the server?

Yes, the server is running 12.3, kernel 3.7.10-1.45-default.  Well, it it's fixed in 3.11, we'll just wait until we can upgrade that server or perhaps just upgrade to a newer kernel. openSUSE 13.1 has 3.11.  I guess we can close this.
Comment 9 Neil Brown 2016-02-23 22:13:34 UTC
I chose to close as "FIXED" as the problem is fixed in current release if run on the server.
thanks.
Comment 10 Per Jessen 2016-02-28 16:06:00 UTC
(In reply to Neil Brown from comment #9)
> I chose to close as "FIXED" as the problem is fixed in current release if
> run on the server.
> thanks.

Just for completeness - have upgraded the NFS server to 3.16.7-32-default, problem has been solved.  Many thanks for your help.