Bug 304535

Summary: Install fails due to LAN card not setup requiring reboot
Product: [openSUSE] openSUSE 10.3 Reporter: Mario Guzman <mario_bz>
Component: InstallationAssignee: Michal Zugec <mzugec>
Status: RESOLVED NORESPONSE QA Contact: Jiri Srain <jsrain>
Severity: Major    
Priority: P5 - None CC: locilka
Version: Beta 3   
Target Milestone: ---   
Hardware: i586   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Yast logs
varlogmsgs
2 sets of various logs

Description Mario Guzman 2007-08-24 19:05:24 UTC
I installed and allowed the repositories to be added during install. This worked fine but when the lan configuration appeared the device was set as "unknown" and configured. If this was allowed to continue the next steps (Saving Remote Config/Config display manager) would hang and loop trying to download items. There was no way out except to hit reset and reboot several times trying to figure out what was going on, the ABORT button did not work. Since no lan activity took place I figured it might be the lan interface. I then did a reboot and during the lan config I deleted the unknown device and then the correct device showed up immediately. I did a config and continued but the same problem occurred. Another reboot  was required to activate the device correctly and installation proceeded ok but there were other problems. They may be due to the fact xfsprogs was missing and the filesystem did not get checked (another problem number).

So the problem could be that during the first installation lan setup to load repositories the correct device is not set even though the downloads worked. The second problem is that even if I have to correct the lan by deleting/adding a reboot is required. I think the device should be activated at that time. A reboot seems dangerous since there is no soft shutdown allowed during installation.

The device is a 88E8052 and the sky2 driver was correctly used for the unknown device but would not work until I deleted and added using sky2. Also, I wonder if this is related to bug 286106. HW info can be found in doc for bug 263065.
Comment 1 Mario Guzman 2007-08-26 00:15:54 UTC
Update: I did a reinstall to see where the problem is and if you uncheck the "setup online repositories before install" the problem does not occur. Instead the lan device shows up later as "Ethernet Device" rather than "unknown" and a reboot is not required. So this has to do with the fact the the online setup (which works at the time) causes the later problem of the unknown device which must be deleted/rebooted to continue the install.
Comment 2 Andreas Jaeger 2007-08-27 19:19:32 UTC
Please provide the YaST logfiles as explained at http://bugs.opensuse.org
Comment 3 Mario Guzman 2007-08-27 20:14:38 UTC
Created attachment 160104 [details]
Yast logs

Keep in mind that the system had to be rebooted to continue in the middle of installation so I don't know if log2 is the oldest (first) log with the info to help. If the data is not here I don't know how to get the log in the middle of an install that loops and requires a reboot.
Comment 4 Michal Zugec 2007-09-04 08:17:49 UTC
This is probably sky2 kernel module problem.
Kernel-maintainer, can you handle this?
Comment 5 Karsten Keil 2007-09-04 09:41:40 UTC
Why do you think that is a sky2 module problem ?
Switching to show up between Ethernet device and Unknown only because you select to install some remote repros sound not like a module bug. sky2 is the correct driver and contains the PCI IDs for this device.
Comment 6 Michal Zugec 2007-09-04 11:47:31 UTC
Ah, I see - it's network configuration in 1.st stage
Comment 7 Lukas Ocilka 2007-09-04 13:50:37 UTC
See these parts of YaST logs:

Found network device: 'eth0' ASUSTeK Marvell 88E8052 Gigabit Ethernet...
Only one network inteface, selecting eth0
Running /sbin/dhcpcd 'eth0' returned $["exit":0, "stderr":"", "stdout":""]

Running function: <YCPRef:boolean Action_WriteInstallInf ()
Writing Netdevice=eth0
Writing NetConfig=dhcp
Writing Alias=sky2
Writing NetUniqueID=rBUF.Myu8c0mh9g5
Writing HWAddr=00:18:f3:6c:45:6e

Running function: <YCPRef:boolean Action_TestInternetConnection ()
Running curl --silent --show-error --max-time 45 --connect-timeout 30 'http://www.novell.com' 1 >/dev/null returned $["exit":0, "stderr":"", "stdout":""]

These logs say that NetSetup configures the network card and successfully tests the connection. That's all.

According to comment #5, sky2 is the correct module name, Network setup in First Stage doesn't invent anything...

Well, maybe it's problem that there is no NetName="ASUSTeK Marvell 88E8052 Gigabit Ethernet Controller" written into the install.inf but that's not what could anything expect from FS NetSetup. I can't any specification or request for doing that. Moreover network setup should handle missing name by matching the NetUniqueID or HWAddr with hwinfo probing, such as NetSetup in FS does.

Sorry, there doesn't seem to be anything for me to fix.
Comment 8 Michal Zugec 2007-09-04 16:46:48 UTC
In fact I don't know where the problem is:

- in 1.st stage there is already working network

- udev rules are not generated/copy into installed system (2007-08-24 10:54:36 <3> linux(3436) [bash] ShellCommand.cc(shellcommand):78 /bin/cp: cannot stat `/etc/udev/rules.d/70-net_persistent_names.rules': No s
uch file or directory
) - already fixed in Beta3

- configuration file is created (2007-08-24 10:54:37 <5> linux(3436) [YCP] clients/save_network.ycp:211 Network Configuration:
BOOTPROTO='dhcp'
STARTMODE='onboot'
NAME='Unknown Network Device'

ifcfg file: ifcfg-eth0
) - 'unknown' is fixed for Beta3

- in 2.nd stage configuration is found and network is started:
Stage [2]: Starting S07-medium...
Stage [2]: ======================
        |-- Checking kernel commandline...
        |-- Got kernel parameter <NoShell> -> start shell on tty2
        |-- network configuration found -> activate network
        |-- Summary for commandline checks:
        |-- Y2_NETWORK_ACTIVE = 1
        |-- Y2_SSH_ACTIVE = 0
        |-- USE_SSH = 0
        |-- VNC = 0

- YaST found that network is configured and will not touch it (2007-08-24 10:56:27 <1> linux(4804) [YCP] Lan.ycp:714 Something already configured: don't propose.
2007-08-24 10:57:01 <1> linux(4804) [YCP] NetworkDevices.ycp:651 No changes to netcard devices -> nothing to write
)

- but here is network no more running and must be restarted (in case of running should be only reloaded) (2007-08-24 10:57:02 <3> linux(4804) [bash] ShellCommand.cc(shellcommand):78 ..dead
2007-08-24 10:57:02 <1> linux(4804) [YCP] NetworkService.ycp:73 rcnetwork restart
)

- "dead" means that device wasn't obtained ip address or hardware wasn't up

This is why I thought it's driver problem, but comments #1, #2 and #5 says it's not a driver problem. Can you attach /var/log/messages file?

grep just 2007-08-24 day. BTW after reboot your time changed from 17:54:37 to 10:55:52 but this is not related to this bug.
Comment 9 Mario Guzman 2007-09-04 16:59:45 UTC
I will try again in beta 3 since it is only a couple of days away and according to the comment above seems 1 or 2 items were fixed that may affect this problem. The logs will take another install since I got it working so I hope you don't mind waiting for beta 3. I will test beta 3 as soon as I download it Thur/Fri. Also, it seems you will need the logs at different points since I have to reboot to get things working.
Comment 10 Mario Guzman 2007-09-07 16:11:12 UTC
Created attachment 162759 [details]
varlogmsgs

This is still a problem in beta3 but the "unknown device" issue is no longer there. Here is what I did: Install using defaults/KDE except added GNOME. Also, I enabled network manager and disabled ipv6 and firewall. In the next phase (loading more repositories online) no lan activity took place. Although ABORT does not work I was able to let it timeout to I could continue. Then, the network test (loading release notes) also failed. Again, ABORT does not work but after several minutes it timed out and I could continue. he install completed but a reboot was required to get lan to work. The attached log is after the install and 1 reboot. BTW, during beta2 I did not change network msg/ipv6/fw and it did not make a difference.
Comment 11 Michal Zugec 2007-09-07 16:38:18 UTC
Sep  6 18:38:33 linux kernel: NET: Registered protocol family 17
Sep  6 18:38:35 linux dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4
Sep  6 18:38:35 linux dhclient: DHCPOFFER from 10.246.1.1
Sep  6 18:38:40 linux dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Sep  6 18:38:40 linux dhclient: DHCPACK from 10.246.1.1
Sep  6 18:38:40 linux dhclient: bound to 10.246.1.10 -- renewal in 1457 seconds.
Sep  6 18:38:41 linux kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Sep  6 18:38:41 linux SuSEfirewall2: Warning: ip6tables does not support state matching. Extended IPv6 support disabled.
Sep  6 18:38:41 linux kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Sep  6 18:38:41 linux SuSEfirewall2: SuSEfirewall2 not active
Sep  6 18:43:02 linux shutdown[3980]: shutting down for system reboot

It seems that dhcpcd works fine.
Can you check resolv.conf and default route? (ctrl+alt+shift+"x" is secret keystroke for xterm)
Comment 12 Mario Guzman 2007-09-07 16:49:50 UTC
Please provide exactly what steps you need performed and at what pints. A new install? How do I check in the middle of installation? Now that install is done and works aren't other logs useless?
Comment 13 Mario Guzman 2007-09-07 16:53:40 UTC
This is resolv.conf as it is now. But I don't know of any way to view it in the middle of install when lan is hung up.

### BEGIN INFO
#
# Modified_by:  NetworkManager
# Process:      /usr/bin/NetworkManager
# Process_id:   2699
#
### END INFO

search mgtech.com


nameserver 206.13.28.12
nameserver 206.13.31.12
Comment 14 Mario Guzman 2007-09-10 22:04:10 UTC
Created attachment 163088 [details]
2 sets of various logs

After several clean installs, here is what I found:

1. Bug 286106 affects testing of this bug. It was found in early alphas then disappeared. It is now in beta 3 (may have been in 1 & 2). If install online repositories is NOT checked this bug occurs every time. For this bug, I wanted to do an install without the install online repos checked to make sure install was ok as in previous tests, it was not so I reopened this problem.

2. With beta 3:
If default to install online repos and DO NOT CHANGE any defaults in Network Setup, the install finishes (even loads more online stuff), but at the end I cannot get to lan and have to reboot due to problem 286106.

If I default to install repos and CHANGE Network Setup to remove IPV6 (one install), or change FireWall to no and add network mgr, then the next step that loads repos/test network fails.

I installed twice for each case to verify, using "netstat -r" during the hangup (using the secret above) I get:

Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
10.246.1.0      *               255.255.255.0   U         0 0          0 eth0
link-local      *               255.255.0.0     U         0 0          0 eth0
loopback        *               255.0.0.0       U         0 0          0 lo

But I should get:
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
10.246.1.0      *               255.255.255.0   U         0 0          0 eth0
link-local      *               255.255.0.0     U         0 0          0 eth0
loopback        *               255.0.0.0       U         0 0          0 lo
default         10.246.1.1      0.0.0.0         UG        0 0          0 eth0

So for both problems the default route is missing and the netstat hangs. Keep in minds the attached logs are for bug 286106 and are not of systems with install online repos checks. However, I can't help but think these are related.
Comment 15 Mario Guzman 2007-09-10 22:10:50 UTC
Since this is so close to GA I consider this a serious installation problem and am willing to provide more info if you are specific as to how,what,when to produce the info/logs for you.
Comment 16 Michal Zugec 2007-10-26 13:53:36 UTC
default route:
is this YaST related problem? In logs I can see you're using dhcp for configuration. What happend when you stop your network and run "dhcpcd eth0" manually?
Comment 17 Stephan Kulow 2007-11-25 08:42:04 UTC
does not seem so problematic after all, closing