Bug 679059

Summary: Boot hangs when NFS is enabled
Product: [openSUSE] openSUSE 11.4 Reporter: noel carneiro <noelcarlos.carneiro>
Component: KernelAssignee: Neil Brown <nfbrown>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: forgotten_87oBmiUsGW, jeffm, jslaby, noelcarlos.carneiro
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: SUSE Other   
Whiteboard: maint:released:11.4:44783
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 623307    
Bug Blocks:    

Description noel carneiro 2011-03-11 22:52:22 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; pt-BR; rv:1.9.2.13) Gecko/20101203 SUSE/3.6.13-0.2.1 Firefox/3.6.13

Fresh installation of  OpenSUSE 11.4 (final version). After adding NFS directory mappings using YaST and rebooting, the system just hangs up forever.

The messages during boot are:
            ...
            Starting NFS client services: Starting auditd
            Initalizing random number generator
            mount.nfs: Connection timed out
            mount.nfs: Connection timed out
            mount.nfs: Connection timed out
            mount.nfs: Connection timed out
            mount.nfs: Connection timed out
            sm-notify idmapd
            Mounting network file systems ...mount.nfs: Connection timed out
            mount.nfs: Connection timed out
            ...

It's the same bug as reported in Bug #623307. As it goes, applying the fix described in the previous bug solves the problem.

Regards,

Noel Carneiro


Reproducible: Always

Steps to Reproduce:
1.Install system (fresh installation)
2.Add NFS directory mappings using YaST
3.reboot

Actual Results:  
System is just waiting for NFS mountings. It doesn't finish booting.


Expected Results:  
Normal reboot, system working properly, with NFS directories mapped.
Comment 1 Neil Brown 2011-03-14 23:15:07 UTC
Just to make sure I understand correctly:

The fix that solves the problem is to set
   Defaultvers=3
in /etc/nfsmount.conf

and the root problem is that you are using network manager so the network doesn't get configured until after the init scripts run, and they won't complete until
an NFS mount succeeds, and the NFS mount cannot succeed until network managers starts.

If I've got that right, the correct solution might be to tell yast to not allow
NFS mounts to be configured if network manager is in use.  They really are
incompatible.
If you use network manager, then you should use automounts for NFS.

Please confirm this agrees with your understanding.
Comment 2 noel carneiro 2011-03-15 00:10:40 UTC
Hi, Neil.

Thanks for the reply.

(In reply to comment #1)
> Just to make sure I understand correctly:
> 
> The fix that solves the problem is to set
>    Defaultvers=3
> in /etc/nfsmount.conf
Yes, that is what I've done. I've also changed /etc/init.d/nfs from
...
mount_usr()
...
        mount -o nolock $where || {
            # maybe network device hasn't appeared yet.
            udevadm settle
            mount -o nolock $where
...

to
...
mount_usr()
...
        mount -o nolock, vers=3 $where || {
            # maybe network device hasn't appeared yet.
            udevadm settle
            mount -o nolock, vers=3 $where
...

according to your suggestions in Bug#623307. It's been working without problems since.

> and the root problem is that you are using network manager so the network
> doesn't get configured until after the init scripts run, and they won't
> complete until an NFS mount succeeds, and the NFS mount cannot succeed until network managers
> starts.
> 
> If I've got that right, the correct solution might be to tell yast to not allow
> NFS mounts to be configured if network manager is in use.  They really are
> incompatible.

I really don't know how to configure Yast to do that

> If you use network manager, then you should use automounts for NFS.

I have very little experience with automount.

> Please confirm this agrees with your understanding.
Comment 3 Neil Brown 2011-03-23 06:07:51 UTC
OK, I remember now.
I really should chase this upstream...
Comment 4 Forgotten User 87oBmiUsGW 2011-09-26 00:28:02 UTC
Neil, has any progress been made on this issue? NetworkManager is much superior to          ifup but having the boot hang on trying to load any nfs files really sucks, even when I've set the version to 3. Thanks, Bob.
Comment 5 Neil Brown 2011-10-10 01:16:08 UTC
Sorry, no progress yet.
Comment 6 Neil Brown 2011-11-07 04:23:34 UTC
I had another look at this, worked out the best way to fix it (I hope) and
have committed a fix for 11.4 and maybe 12.1 - I might have missed a 12.1 deadline, I'm not sure.

So the kenrel-of-the-day for 11.4 should have this fixed in a day or so.

If anyone is interested in testing, please do so and let me know if it makes a difference.

I have sent the patch upstream and will await a reply.
Comment 7 noel carneiro 2011-11-13 17:39:18 UTC
Hi, Neil.

I tried it out but I don't see any difference.

I updated the Kernel and removed the ", vers=3" from /etc/nfsmount.conf and commented out Default=3 from /etc/init.d/nfs. Rebooted and the system hangs at boot, with the message
      Starting NFS clinet services: mount.nfs: Connection timed out.

Regards,

Noel Carneiro
Comment 8 Neil Brown 2011-11-13 22:15:04 UTC
Thanks for testing.
Maybe there is another problem.

What kernel exactly are you using?

Does it hang indefinitely?  If not, how log does it hang for?
How long between "Starting NFS client services:" and "mount.nfs: Connection timed out.", and then how long until it continues?

Thanks,
Comment 9 noel carneiro 2011-11-15 05:01:31 UTC
Hi, Neil.

Right now, I'm using kernel 2.

I'm going to get back to you on the delays in a couples of days. I didn't recorded them during the test.

Regards,

Noel Carneiro
Comment 10 noel carneiro 2011-11-15 05:03:12 UTC
Sorry, kernel version 2.6.37.6-09.
Comment 11 Neil Brown 2011-11-15 05:44:53 UTC
I suspect you mean 2.6.37.6-0.9 (which I mistyped at least once just then!)
That is about a month old and doesn't have the required patch.
You need a non-released kernel-of-the-day which you can find in

http://download.opensuse.org/repositories/Kernel:/openSUSE-11.4/openSUSE_11.4/x86_64/

You might want to tell zypper to allow multiple version of the kernel as described in
   http://en.opensuse.org/openSUSE:Kernel_of_the_day

(add "multiversion = provides:multiversion(kernel)" to zypp.conf).
Comment 12 noel carneiro 2011-11-18 05:35:59 UTC
Hi, Neil.

I'm glad to inform that the work you did for the 11.4 kernel worked. I tested it with the KOTD 2.6.37.6-69 and it mounted all the NFS mounts during boot without any problem.

And it also worked in the kernel shipped with the new OpenSUSE 12.1. It takes a little longer to boot, maybe 30 or 60 seconds more, but eventually it boots with all the NFS mounts.

Thanks very much for all your work.

Best regards,

Noel Carneiro
Comment 13 Neil Brown 2011-11-18 05:51:10 UTC
Great, thanks for the confirmation.
Now I just have to get upstream to accept it (they are just a bit busy I think).
Comment 14 Swamp Workflow Management 2012-01-17 11:05:21 UTC
Update released for: kernel-debug, kernel-debug-base, kernel-debug-base-debuginfo, kernel-debug-debuginfo, kernel-debug-debugsource, kernel-debug-devel, kernel-debug-devel-debuginfo, kernel-default, kernel-default-base, kernel-default-base-debuginfo, kernel-default-debuginfo, kernel-default-debugsource, kernel-default-devel, kernel-default-devel-debuginfo, kernel-desktop, kernel-desktop-base, kernel-desktop-base-debuginfo, kernel-desktop-debuginfo, kernel-desktop-debugsource, kernel-desktop-devel, kernel-desktop-devel-debuginfo, kernel-devel, kernel-docs, kernel-ec2, kernel-ec2-base, kernel-ec2-base-debuginfo, kernel-ec2-debuginfo, kernel-ec2-debugsource, kernel-ec2-devel, kernel-ec2-devel-debuginfo, kernel-ec2-extra, kernel-ec2-extra-debuginfo, kernel-pae, kernel-pae-base, kernel-pae-base-debuginfo, kernel-pae-debuginfo, kernel-pae-debugsource, kernel-pae-devel, kernel-pae-devel-debuginfo, kernel-source, kernel-source-vanilla, kernel-syms, kernel-trace, kernel-trace-base, kernel-trace-base-debuginfo, kernel-trace-debuginfo, kernel-trace-debugsource, kernel-trace-devel, kernel-trace-devel-debuginfo, kernel-vanilla, kernel-vanilla-base, kernel-vanilla-base-debuginfo, kernel-vanilla-debuginfo, kernel-vanilla-debugsource, kernel-vanilla-devel, kernel-vanilla-devel-debuginfo, kernel-vmi, kernel-vmi-base, kernel-vmi-base-debuginfo, kernel-vmi-debuginfo, kernel-vmi-debugsource, kernel-vmi-devel, kernel-vmi-devel-debuginfo, kernel-xen, kernel-xen-base, kernel-xen-base-debuginfo, kernel-xen-debuginfo, kernel-xen-debugsource, kernel-xen-devel, kernel-xen-devel-debuginfo, preload-kmp-default, preload-kmp-desktop
Products:
openSUSE 11.4 (debug, i586, x86_64)
Comment 15 Jiri Slaby 2013-05-07 13:36:36 UTC
I'm trying to get some patches from the master branch to upstream. This patch from master:
  patches.fixes/nfs-connect-timeout
seems to have a bug in it, I think. Over the time, PF_FSTRANS was added, but the hunk in the master branch in that patch doesn't unset that flag. Am I correct?

Also I would appreciate if you resend the patch rebased on the top of the current upstream. There they moved the switch-case labels. Or is the patch still needed when this commit is in:
commit 3ed5e2a2c394df4e03a680842c2d07a8680f133b
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon Mar 4 17:29:33 2013 -0500

    SUNRPC: Report network/connection errors correctly for SOFTCONN rpc tasks
    
?
Comment 16 Neil Brown 2013-05-08 05:22:10 UTC
Hi Jiri,
 thanks for looking into this.
The bug is indeed fixed by the commit that you found, so the nfs-connect-timeout patch isn't needed any more.
I have removed it from master.