Bug 722304

Summary: set NM_ONLINE_TIMEOUT=0 by defaulit
Product: [openSUSE] openSUSE 12.1 Reporter: Ludwig Nussel <lnussel>
Component: NetworkAssignee: Marius Tomaschewski <mt>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: aj, coolo, dvaleev, fcrozat, mt, rwooninck, varkoly
Version: Factory   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: NM_ONLINE_TIMEOUT default setting, updated migration hook
NM_ONLINE_TIMEOUT 0 default setting, removed old migration hook

Description Ludwig Nussel 2011-10-05 14:54:39 UTC
sysv boot waits for NM_ONLINE_TIMEOUT which is 30 seconds by default. Such a timeout is only needed for legacy services that can't deal with dynamic networking. Since such services don't make sense with NM having the timeout by default doesn't make sense either.
Therefore NM_ONLINE_TIMEOUT should be set to zero by default.
Comment 1 Bin Li 2011-10-10 08:47:24 UTC
I thought it's okay to set it to zero.
Comment 2 Marius Tomaschewski 2011-10-10 11:01:09 UTC
See also fate#307610, where it were requested to enable it and to improve
nm-online itself to wait only for wired (ethernet) interfaces with cable
connected and skip any waiting on e.g. wireless interfaces.
When I understand Ludwig right, it seems that the second part basically
does not work correctly (or did not found its way to newer versions) and
there actually are useless timeouts...

(In reply to comment #1)
> I thought it's okay to set it to zero.

I'm going to change it for 12.1 back to 0 (only the default setting!!)
when nobody stops me soon and commit/submit it tomorrow then.
Comment 3 Marius Tomaschewski 2011-10-10 11:15:56 UTC
Created attachment 455354 [details]
NM_ONLINE_TIMEOUT default setting, updated migration hook

Not yet tested...
Comment 4 Ludwig Nussel 2011-10-10 15:27:58 UTC
I'm not sure it's necessary to change on update. I'd just leave the default empty for new installs and assume 0 in that case.
Comment 5 Marius Tomaschewski 2011-10-11 09:00:26 UTC
(In reply to comment #4)
> I'm not sure it's necessary to change on update. I'd just leave the default
> empty for new installs and assume 0 in that case.

I were thinking about this, but it would introduce a difference between updated
and fresh systems for people that were using old default before / never adjusted this setting... also not good.

And there were already a hook migrating the setting from 0 to 30 (from old to
new default) because of fate#307610. Now, I've changed it to migrate from old
to new defaults without to check any particular value. But maybe you're right
and it is better to remove the hook completely.
Comment 6 Marius Tomaschewski 2011-10-19 06:53:07 UTC
Frederic,
this change affects also NetworkManager-wait-online.service under systemd.
Is this change OK from your PoV?
Comment 7 Frederic Crozat 2011-10-19 08:00:03 UTC
well, it might break stuff like remotefs which are behind NM connection (wifi), since it might not be instantaneous to get connection.

But it is not related to systemd, so I'm ok for the change (we want to be consistent between systemd and sysvinit).
Comment 8 Ludwig Nussel 2011-10-19 08:13:50 UTC
if anything depends on a 30 seconds timeout by default it's just plain broken. NM may never come online. If anyone encounters such a case please let me know. There needs to be another way to fix this, like e.g. hooking into ifup.d.
Comment 9 Marius Tomaschewski 2011-10-19 08:45:27 UTC
From comment #2:
> See also fate#307610, where it were requested to enable it and to improve
> nm-online itself to wait only for wired (ethernet) interfaces with cable
> connected and skip any waiting on e.g. wireless interfaces.

I've tested it a bit and AFAIS that the second part of fate#307610 does
not work and there is always a delay of 30sec, also when there is no
cable connected to the ethernet card (no another configurations defined
or ever used before).

(In reply to comment #7)
> well, it might break stuff like remotefs which are behind NM connection (wifi),
> since it might not be instantaneous to get connection.

remotefs over wifi is not a good idea :-) and not a good example.

NM is a /usr [==remotefs] depending service itself, so it can't support
/usr on a remotefs at all.

When somebody needs an another remotefs (e.g. /home) with NM, he
should use NM dispatcher.d scripts for anyway. [In ifup case there
are also ifservices(5) to do such things.]


But YES: timeout 0 definitely breaks all services that are behind the
network-remotefs, even the connection can be established.

On my notebook it needs something about 10sec to get dhcp leases, ....
When timeout is set to 30, for example the ntp timesync works at boot
time. With timeout 0 it never works.

[in case of ntp the time sync happens as soon as the network is up,
 because ntp monitors the link itself. but another services don't do].
Comment 10 Marius Tomaschewski 2011-10-19 08:49:01 UTC
(In reply to comment #8)
> if anything depends on a 30 seconds timeout by default it's just plain broken.
> NM may never come online. If anyone encounters such a case please let me know.
> There needs to be another way to fix this, like e.g. hooking into ifup.d.

Yes, either if-up.d or ifservices(5) in case of ifup, dispatcher.d scripts
in case of NM.

The problem is, we _are_ starting init services at boot time by default and
the timeout is the only thing that "protects them" at least in the case the
connection works. With timeout=0 they'll always fail.
Comment 11 Marius Tomaschewski 2011-10-19 08:52:16 UTC
Don't understand me wrong: it is OK for me to set it to 0 as _I_ don't
expect that the network is up at boot time on my notebook where I use NM.
I just want, that everybody knows here what happens ...
Comment 12 Marius Tomaschewski 2011-10-19 08:52:52 UTC
Stephan,

what do you think about?
Comment 13 Frederic Crozat 2011-10-19 08:59:16 UTC
<systemd subliminal advertising>
for remotefs (!= /usr), people should just add comment=systemd.automount to their fstab for remote mount point and get automount for free, which would prevent this kind of issue when using NM
</systemd subliminal advertising> ;)
Comment 14 Marius Tomaschewski 2011-10-19 09:06:50 UTC
Created attachment 457474 [details]
NM_ONLINE_TIMEOUT 0 default setting, removed old migration hook

Just a variant that simply removes the (fate#307610) migration hook,
so on update there will be a default of 30, new installations will
get the new default of 0. See also comment 4 and comment 5.
Comment 15 Stephan Kulow 2011-10-21 07:59:22 UTC
I added ntp timesync to ifup scripts myself because they always fail as NM dispatcher btw.

But I wouldn't put too much effort into the update case - NM_ONLINE_TIMEOUT should work as it's documented or support for it should go away (leaving 0 for everyone).
Comment 16 Marius Tomaschewski 2011-10-21 11:24:43 UTC
(In reply to comment #15)
> But I wouldn't put too much effort into the update case - NM_ONLINE_TIMEOUT
> should work as it's documented or support for it should go away (leaving 0
> for everyone).

I interpret this as a ACK for patch (removed old migration hook) from
comment 14 changing the default to 0 -- going to apply and submit it
to factory today.
Comment 17 Marius Tomaschewski 2011-10-21 12:32:09 UTC
Applied patch variant from comment 14 to git:

http://gitorious.org/opensuse/sysconfig/commit/ec9e9b09f9c53a965fbd35cd00ca8b7f5c793654/diffs/9c61279efc7787953b2fdfe38be9802e0310292c

Now in Base:System/sysconfig + in openSUSE:Factory request id 88965.
Comment 18 Bernhard Wiedemann 2011-10-21 13:00:09 UTC
This is an autogenerated message for OBS integration:
This bug (722304) was mentioned in
https://build.opensuse.org/request/show/88965 Factory / sysconfig
Comment 19 Raymond Wooninck 2011-11-16 05:48:44 UTC
Setting the NM_ONLINE_TIMEOUT=0 introduces another issue with systemd. 

In the sysvinit script network, it is checked if this timeout is 0. If so, then the check to see if a connection exists (through calling nm-online) is omitted. 

In systemd however this particular validation is not done and nm-online is called with the set timeout in the systemd service file /lib/systemd/system/NetworkManager-wait-online.service. This has now as a result that if NM is not connected the system hangs in it's boot process. On my system I have noticed that in this case the default service timeout of systemd is not kicking in and that the system is really hanging forever (I waited 15 minutes before doing a hard reset). I also received the confirmation from a friend that he had exactly the same issue. 

Checking directly the tool nm-online (by calling it with nm-online --timeout=0) shows that the parameter --timeout=0 is interpreted as "do not use any timeout and wait until there is a connection). This can be tested by issuing the command "nm-online --timeout=0" in a konsole while there is no network connection.

I discussed this initially with Stephan Kulow and we both believe that systemd should be adjusted to have the same validation for the zero timeout as the sysvinit script.

Based on a discussion with Frederic, I have reopened this bug.
Comment 20 Ludwig Nussel 2011-11-16 07:47:16 UTC
iow rcnetwork interprets 0 as no timeout whereas nm-online interprets it as 'infinite' *sigh*.
Comment 21 Raymond Wooninck 2011-11-16 07:53:47 UTC
Well, kinda. 

rcnetwork contains the following statements: 


if NM_ONLINE_TIMEOUT=0
then 
    return 0
else
   nm-online --timeout=%{NM_ONLINE_TIMEOUT}
fi

This way nm-online is never called and therefore preventing the bug from appearing. 

In the service file from systemd, nm-online is called regardless.

Found this one out while being in a plane and trying to boot my laptop.
Comment 22 Ludwig Nussel 2011-11-16 07:54:55 UTC
I've opened bug 730628 for the inconsistent of the .service file. The original request remains fixed.
Comment 23 Frederic Crozat 2012-02-03 10:24:33 UTC
*** Bug 738727 has been marked as a duplicate of this bug. ***