|
Bugzilla – Full Text Bug Listing |
| Summary: | NIS/autofs not starting properly | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | Karl Eichwalder <ke> |
| Component: | Network | Assignee: | Marius Tomaschewski <mt> |
| Status: | RESOLVED FIXED | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | coolo, kukuk, mt, radmanic, varkoly |
| Version: | Beta 3 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | proposed patch for sysconfig | ||
|
Description
Karl Eichwalder
2008-05-15 10:06:42 UTC
Waiting for more info ;) :) Booting again, everything got initialized properly. It is probably not a new bug introduced with beta3. Maybe, it isn't even related to the update procedure. It also happened in the past and also to Tanja during 11.0 alpha/beta test installations. It is rather annoying, though ;) Then please provide the informations. But I guess this is because of the massive changes of the installation workflow => something for the responsible PrjMgr, but nothing for me. Thorsten, there were no changes in the update workflow and Karl did an update (he says) There were no changes to NIS, too, and it only happens with the update workflow according to Karl. But we really need more informations from Karl, the report is really useless and does not contain any information. If the home directory does not exist, this means Karl could login? So NIS was working? Why does bugzilla reset NEEDINFO itself??? Strange. I clicked "Throw away my changes, and revisit bug 390676" after #c6. This is the answer I wrote: Yes, at least, user 'ke' was known. I denied logging in, because I was quite in a hurry at noon today, and I assumed rebooting would probably help. And actually, it did. If you think it helps I can attach all of /var/log or just /var/log/messages. As said, the last weeks Tanja noticed something similar after new installations. It usually helps to stop/start rcnetwork/rcypbind/rcautofs. All of them, or only a subset, I'm not sure right now. All this systems seems to use dhcp. And on all systems I'm able to reproduce the reason was always the same: network initialisation and dhcp needs longer than the time was. This is a general problem with our init scripts/setup & network.
The autofs script/service and many another services too needs a
working network or they usually just fail.
The LSB $network dependency in the autofs just says, start network
before autofs, but does not provide more detailed dependency (and
when I'm not wrong, not even successfull exit from network service).
This is the reason why the ifservices(5) functionality exists.
It allows you to define, that a service depends on an interface.
Using ifservices, you'll get as usual:
[...]
br0 Ports: [eth0]
br0 forwarddelay (see man ifcfg-bridge) ... ready
br0 (DHCP) . . . . . no IP address yet... backgrounding. waiting
Setting up service network . . . . . . . . . . . . . . done.
And as soon as dhcp has the IP and completed the interface setup,
the services from /etc/sysconfig/network/ifservices[-br0] will be
started.
So I consider the bug as a configuration problem and resolve it as
WONTFIX. When you like, create a feature request to find a better
solution and change to FEATURE then.
This is no new feature, this is a clear regression. We changed somewhat in the system that suddenly dhcp does not get the IP anymore early enough. This was no problem (or only very seldom) with previous releases, but is now reported by a lot of people. And this has nothing to do with NIS and/or autofs. This both services are only the ones people see later prominently (cannot login), since the have this "stupid" splash screen hiding the huge amount of init scripts failing during boot. IMO this is a very old and common problem, when the dhcp server is
slow and does not answers in 5 seconds. What I can do is to increase
the default time we wait for dhcpcd to complete:
## Type: integer
## Default: 5
#
# When the DHCP client is started at boot time, the boot process will stop
# until the interface is successfully configured, but at most for
# DHCLIENT_WAIT_AT_BOOT seconds.
#
DHCLIENT_WAIT_AT_BOOT="5"
(the default of 5 seconds is many years old) then the network script
waits longer and usually you'll get an IP:
br0 (DHCP) . . . . . . . . . . . . . . . . . . . . . . IP/Network: '192.168.110.1' / '255.255.255.0'
Setting up service network . . . . . . . . . . . . . . done
Well... but because you mean that this is a regression, I reassign
to the maintainer of the dhcpcd.
In /mirror/SuSE/ftp.suse.com/pub/people/varkoly/dhcpcd I've a new version of dhcpcd. Please test this. I installed it but I do not have time (nor skill) to do detailed testing. Created attachment 216440 [details]
proposed patch for sysconfig
Please let me know, when I should submit the above patch to 11.0. Yes, please submit, I hope it will fix the issues. I have done a few tests on 10.3 and 11.0 Beta 3 to evaluate the issue. My results show, that on a 10.3 it tokk not more than 5 seconds to receive an IP address from the DHCP server. On 11.0 Beta3 it was even within 2 seconds and that reliably. I even did this test on MacOS X (Leopard) and it claimed an address within 3 seconds repeatedly. I think this is neither a specific issue of DHCP client nor of our local DHCP server setup. On 10.3 e.g. I can't login sometimes, although networking was set up successfully, only a restart of KDM resolves the issue. I think this is because of KDM starting up earlier than the LDAP service . It may be that the DHCP client adds to the reported issue but it is not the sole caues of it. I'm reassigning this to the Project Manager to process this bug any further. the "KDM starts early" is a feature - so far I got only little reports from NIS users that there is a problem. Technically we could disable early start for NIS completely - so far we only do it for autologin. But you can't compare dhcp speed during normal system load and booting. During booting 5s are pretty quickly over. So I think the sysconfig patch will help, the only other way I can think of is giving the dhcp client more priority. I think #14 will make the problem unlikely enough Submitted patch from comment #14 to stable: - Increased DHCLIENT_WAIT_AT_BOOT to 15 and added comment note, that RFC 2131 specifies, that the dhcp client should wait a random time between one and ten seconds to desynchronize the use of DHCP at startup (bnc#390676). See also bug #393801 (may be a duplicate). Primarily that is an infrastructure problem. In this case incrasing DHCLIENT_WAIT_AT_BOOT may help. |