Bugzilla – Bug 1093877
NFSD: client 192.168.6.102 testing state ID with incorrect client ID
Last modified: 2019-11-25 10:29:34 UTC
I am installing a number of identical systems. I installed one, then I cloned the filesystem for the others. They all have root on NFS. In the clones I amended /etc/fstab, /etc/udev/rules.d/70-persistent-net.rules and /var/lib/wicked/duid.xml, then I rebuilt the initrds. I had no problems booting system#1, but as soon as system#2 or #3 would come up, the nfs server started spewing out: kernel: [16754897.516786] NFSD: client 192.168.6.101 testing state ID with incorrect client ID kernel: [16754897.504556] nfsd4_validate_stateid: 365 callbacks suppressed kernel: [16754852.428142] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt c86c0800 xid 5fa768c4 On the client I saw: NFS: nfs4_reclaim_open_state: Lock reclaim failed! Loads of them, on client and server both. It took quite a bit of debugging, but in the end the problem was that /etc/hostname was the same on all systems. When I made it unique per system, and rebuilt the initrd, no problems. I guess NFS somehow uses /etc/hostname, but shouldn't it be using /proc/sys/kernel/hostname instead? The systems are given hostnames over dhcp.
nfs does not use /etc/hostname, it uses the gethostname() system call. Providing dhcp sets the hostname before the root filesystem is mounted, there should be no problem. Obviously the root filesystem cannot be mounted before dhcp sets the IP address. It all seems rather odd. Maybe the mount attempt starts before the dhcp attempt completes? That seems unlikely but might explain the wrong host name being used. Are you able to capture a tcpdump network trace of the wrong clinet id being used? I just want to be absolutely sure I understand what is happening before digging into too much obscure code (dracut isn't bad, but it takes a bit of getting used to).
(In reply to Neil Brown from comment #1) > nfs does not use /etc/hostname, it uses the gethostname() system call. > Providing dhcp sets the hostname before the root filesystem is mounted, > there should be no problem. Obviously the root filesystem cannot be mounted > before dhcp sets the IP address. > It all seems rather odd. The systems are handed kernel+initrd over pxe, then they boot that. I presume the initrd sets up the network, mounts root, then does a pivot. Something like that. That process works fine, provided /etc/hostname is unique. > Are you able to capture a tcpdump network trace of the wrong clinet id being > used? That should be easy to do - the problem is easily reproduced.
Hi Neil I've put a capture here: https://files.jessen.ch/rainbow-45.pcap.xz There are two newly cloned machines, discompi4 and discompi5, with identical /etc/hostname (linux-m8z1).
I ran a diff on two initrds - one that works (good) and one that doesn't (bad). Files the differ: initrd.bad/etc/cmdline.d/95nfs.conf initrd.bad/etc/group (differences seem unrelated). initrd.bad/etc/hostname initrd.bad/lib/dracut/hostonly-files (differences seem unrelated). # cat bad/etc/cmdline.d/95nfs.conf root=nfs4:rainbow:/srv/nfsroot/discompi1/root:rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.6.101,local_lock=none,addr=192.168.2.128 ifname=eth0:00:30:48:96:eb:8e ip=eth0:static # cat good/etc/cmdline.d/95nfs.conf root=nfs4:192.168.2.128:/srv/nfsroot/discompi1/root:rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.6.101,local_lock=none,addr=192.168.2.128 ifname=eth0:00:30:48:96:eb:8e ip=eth0:static
> I ran a diff on two initrds - one that works (good) and one that doesn't (bad). I do like it when the person who reports the bug also find the cause - I just need to sit and wait :-) > # cat bad/etc/cmdline.d/95nfs.conf This implies that the initramfs was created by dracut when run with the --hostonly (or -H) option. To quote the man page: > This has the drawback, that you can’t > put the disk on another controller or machine, and that you can’t switch to another root filesystem, > without recreating the initramfs image. and > The usage of the --hostonly option is only for experts and you > will have to keep the broken pieces. Did you explicitly give --hostonly? Are you using the one initramfs for all nfs-boot systems?
(In reply to Neil Brown from comment #5) > > I ran a diff on two initrds - one that works (good) and one that doesn't (bad). > > I do like it when the person who reports the bug also find the cause - I > just need to sit and wait :-) Hehe, two pairs of eyes .... [snip] > Did you explicitly give --hostonly? > Are you using the one initramfs for all nfs-boot systems? I wasn't even aware of that option. Each system has it's own initrd, due to fstab, NIC naming and duid.xml. (though I probably don't need it for the fstab). I did not knowingly build with --hostonly, just a plain "mkinitrd -k vmlinuz -i initrd". After a system has failed with the (apparent) hostname issue, I pxe boot the installer, mount the nfs root, bind-mount prod/dev/sys, then chroot. Amend /etc/hostname, then rebuild the initrd and copy it to the tftp server. How do you know --hostonly was used? All I see is one 95nfs.conf using the fileserver name, the other using its address.
> How do you know --hostonly was used? The file 95nfs.conf is only created if $hostonly_cmdline == "yes". But you didn't set that yourself I now see. SUSE makes it the default via /usr/lib/dracut/dracut.conf.d/01-dist.conf Are you using the same initramfs file for all your clones? I guess that isn't supported with the default mkinitrd option on SUSE. If you add the "-A" flag to mkinitrd, that will over-ride the default. See mention of this in "mkinitrd". This doesn't explain the difference between "good" and "bad" though. Having looked more deeply, I'm surprised they aren't the other way around. The hostname that is used in the client-id is determined inside the kernel. The first time any NFS request is initiated, the current hostname is copied and that used ever more, even if the hostname is changed again. So if there is a mount attempt before dhcp completes, the value from /etc/hostname will be used. If the first attempt is after dhcp completes, the value it sets will be used. In the case where the root filesystem is mounted using a hostname, you presumably need to have the network up before a DNS lookup can succeed. By the time it succeeds, the new hostname should be set. If, however, the mount is performed with an IP address, that could start trying before the network is up. So I do wonder if you got those backwards. Though maybe the host name 'rainbox' is in /etc/hosts. I think the short answer is that dracut needs to make sure dhcp completes before it even tries to mount an NFS root filesystem. I'm surprised it doesn't already, but I'm not very familiar with dracut. Maybe Daniel Molkentin can help... Hi Daniel, your name seems to be all over dracut.changes. Do you know if there is anything to force mounting of a root filesystem over NFS to wait for dhcp to complete, and whether it is easy to make that happen?
(In reply to Neil Brown from comment #7) > Are you using the same initramfs file for all your clones? No, I build a separate one on each clone. Years ago it was possible to have just one for all, there was logic in the linuxrc code to pick up the MAC address of the active interface, from <somewhere>. I'm not sure that works anymore. > This doesn't explain the difference between "good" and "bad" though. Having > looked more deeply, I'm surprised they aren't the other way around. > > The hostname that is used in the client-id is determined inside the kernel. > The first time any NFS request is initiated, the current hostname is copied > and that used ever more, even if the hostname is changed again. > So if there is a mount attempt before dhcp completes, the value from > /etc/hostname will be used. If the first attempt is after dhcp completes, > the value it sets will be used. Right, that sounds plausible, although why would there be a mount attempt before dhcp has completed? > In the case where the root filesystem is mounted using a hostname, you > presumably need to have the network up before a DNS lookup can succeed. I don't know why one uses ip address, the other hostname. Maybe I need to take a closer look at that. > By the time it succeeds, the new hostname should be set. If, however, the > mount is performed with an IP address, that could start trying before the > network is up. > So I do wonder if you got those backwards. Though maybe the host name > 'rainbox' is in /etc/hosts. No, nothing in /etc/hosts. Well, I might have mixed up those initrds, but it's easily reproduced. > I think the short answer is that dracut needs to make sure dhcp completes > before it even tries to mount an NFS root filesystem. I'm surprised it > doesn't already, Me too, that is really odd.
(In reply to Neil Brown from comment #7) > I think the short answer is that dracut needs to make sure dhcp completes > before it even tries to mount an NFS root filesystem. I'm surprised it > doesn't already, but I'm not very familiar with dracut. > Maybe Daniel Molkentin can help... > Hi Daniel, your name seems to be all over dracut.changes. Do you know if > there is anything to force mounting of a root filesystem over NFS to wait > for dhcp to complete, and whether it is easy to make that happen? I would also be curious to see how dracut (or rather: wicked) does not wait for a lease before carrying on. Please attach the complete output of the boot sequence with added "rd.debug" to the kernel command line. Make also sure to remove "quiet" and "splash" if listed.
Created attachment 771832 [details] discompi2 boot output with rd.debug
Thanks for the debug file. Two significant lines: May 30 15:20:26 linux-m8z2 dracut-initqueue[300]: + mount_nfs nfs:192.168.2.128:/srv/nfsroot/discompi2/root: /sysroot eth0 May 30 15:20:28 linux-0002 systemd[1]: Set hostname to <linux-0002>. These two are happening in the "wrong" order. over to you Daniel.
(In reply to Neil Brown from comment #11) > Thanks for the debug file. > Two significant lines: > > May 30 15:20:26 linux-m8z2 dracut-initqueue[300]: + mount_nfs > nfs:192.168.2.128:/srv/nfsroot/discompi2/root: /sysroot eth0 > > May 30 15:20:28 linux-0002 systemd[1]: Set hostname to <linux-0002>. > Well spotted, I though I went over that log yesterday, not finding anything of significance. Maybe just to clarify: linux-m8z2 - hostname given at installation time, stored in the initrd. linux-0002 - my changed hostname, from /etc/hostmname on the NFS root. discompi2 - name being handed out by dhcp.
I believe I am also seeing the same issue on tumbleweed but without the hostname file being in the initrd as I am using a non-hostonly initrd. A temporary solution I have found is to mount all nfs filesystems with option vers=3. This has stopped all the errors and allowed all my systems to mount root over nfs.
(In reply to Per Jessen from comment #12) > (In reply to Neil Brown from comment #11) > > Thanks for the debug file. > > Two significant lines: > > > > May 30 15:20:26 linux-m8z2 dracut-initqueue[300]: + mount_nfs > > nfs:192.168.2.128:/srv/nfsroot/discompi2/root: /sysroot eth0 > > > > May 30 15:20:28 linux-0002 systemd[1]: Set hostname to <linux-0002>. > > > > Well spotted, I though I went over that log yesterday, not finding anything > of significance. Actually, the hostname setup is done only after systemd is restarted on the real root device. The logs show: --> May 30 15:20:26 linux-m8z2 systemd[1]: Switching root. [...] May 30 15:20:28 linux-0002 systemd[1]: systemd 234 running in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN2 -IDN default-hierarchy=hybrid) May 30 15:20:28 linux-0002 systemd[1]: Detected architecture x86-64. May 30 15:20:28 linux-0002 systemd[1]: Set hostname to <linux-0002>. --< So that is probably not the culprit as the nfs mount is already done at this stage.
(In reply to Mark Alston from comment #13) > I believe I am also seeing the same issue on tumbleweed but without the > hostname file being in the initrd as I am using a non-hostonly initrd. > > A temporary solution I have found is to mount all nfs filesystems with > option vers=3. This has stopped all the errors and allowed all my systems > to mount root over nfs. I could reproduce the issue on a current tumbleweed. Updating the bug accordingly.
Actually, the dhcp setup is finished before mount_nfs, which is triggered via the nfsroot script: --< [ 8.343997] dracut-initqueue[355]: + ip -4 addr add 192.168.57.104/24 broadcast 192.168.57.255 dev ens3 [ 8.350774] dracut-initqueue[355]: + route=() [ 8.350834] dracut-initqueue[355]: + local r route [ 8.350862] dracut-initqueue[355]: + '[' -n '' ']' [ 8.350884] dracut-initqueue[355]: + local g [ 8.350911] dracut-initqueue[355]: + '[' -n 192.168.57.1 ']' [ 8.350933] dracut-initqueue[355]: + for g in ${GATEWAYS} [ 8.350970] dracut-initqueue[355]: + ip -4 route add default via 192.168.57.1 dev ens3 [ 8.354793] dracut-initqueue[355]: + break [ 8.354844] dracut-initqueue[355]: + '[' -n '' ']' [ 8.354869] dracut-initqueue[355]: + '[' -n kvm104 ']' [ 8.354892] dracut-initqueue[355]: + hostname kvm104 [ 8.357926] dracut-initqueue[355]: /sbin/ifup: line 73: hostname: command not found [ 8.358369] dracut-initqueue[355]: + '[' '!' -s /tmp/net.ens3.resolv.conf.ipv4 ']' [ 8.358415] dracut-initqueue[355]: + '[' -n virt.net ']' [ 8.358440] dracut-initqueue[355]: + echo domain virt.net [ 8.358462] dracut-initqueue[355]: + '[' -n virt.net ']' [ 8.358483] dracut-initqueue[355]: + echo search virt.net [ 8.358504] dracut-initqueue[355]: + '[' -e /tmp/net.ens3.resolv.conf.ipv4 ']' [ 8.358525] dracut-initqueue[355]: + '[' '!' -e /etc/resolv.conf ']' [ 8.358546] dracut-initqueue[355]: + cp -f /tmp/net.ens3.resolv.conf.ipv4 /etc/resolv.conf [ 8.363375] dracut-initqueue[355]: + info 'DHCP is finished successfully' [ 8.363444] dracut-initqueue[355]: + echo 'DHCP is finished successfully' [...] [ 8.410470] dracut-initqueue[355]: + /sbin/nfsroot ens3 nfs4:192.168.57.254:/nfsroot:rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.57.141,local_lock=none,addr=192.168.57.254 /sysroot --> You can see that a hostname was gathered via dhcp but then there is an error in the ifup script. That looks suspicious.
This is actually a duplicate of bug#1146661 *** This bug has been marked as a duplicate of bug 1146661 ***