|
Bugzilla – Full Text Bug Listing |
|
Description
Thomas Renninger
2008-08-06 13:03:17 UTC
Created attachment 232040 [details]
Yast2 logs after end of second stage. Yast2 network module show both links are down (wrong)
Created attachment 232041 [details]
Yast2 logs after doing "ethtool eth0" and "ethtool eth1", entering yast network module then showed that one device has a link (correct)
I don't have this build, what's yast2-network version there? Just try - this was tested and fixed in yast2-network-2.17.11: http://mzugec.blogspot.com/2008/07/autoyast-network-device-names.html yast2-network-2.17.14-2 Something still seems to not work. trying to reproduce No it works fine (except some known bootloader problems) with Alpha1. When 2nd stage finish, you're in running system and network devices are up/down according your configuration (rcnetwork status). Attach <networking> section and content of /etc/sysconfig/network/ifcfg-* please. Decreased serverity to Major This was not Alpha1, but a test build afterwards coolo asked us to test. Let's see if things change in Alpha2, if not we have to increase severity again as we have to set up every machine by hand. The tested built is: /mounts/dist/machcd2/CDs/openSUSE-11.1-Alpha1plus-DVD-x86_64-Build0016/DVD1/ I'll attach some screenshots. You still may want to log into *adalid*. It's freshly installed -> therefore network is broken and you have to log in via serial console: ssh root@sconsole1 cscreen -> choose adalid Created attachment 233488 [details]
Unrelated to this bug, but shortly after this error message things break -> because these packages cannot be installed, because network is broken
Created attachment 233489 [details]
Extra packages needed for installation cannot be installed
Created attachment 233490 [details]
The next error window
Created attachment 233491 [details]
And here we have to break our automated installation...
please test with comming alpha2 This can still be reproduced on Alpha2. Adalid's network now got set up by hand. This might not be a Yast, but a ethtool problem. This should be found out first... How does Yast find out whether a network cable is plugged in and the link is active? >> How does Yast find out whether a network cable is plugged in and the link is
active
It uses /sys/class/net/*/carrier information
Hi Karsten, this is the bug where I expect that network link detection does not work. Machines that should for now be affected: *adalid*, *field*. According to Karsten one cannot really trust /sys/class/net/*/carrier or there may be delay issues (let the network driver give some more time to detect a link?). Bjorn is playing a bit with it on adalid (go ahead and double check on *field* -> same driver? same problem? ...) BTW: We also saw problems with ethtool and therefore are now activating network devices via ifup and then use ethtool to detect the link. I could imagine ethtool and /sys/../carrier link detection are rather similar/same? Then yast problaby also needs a workaround like activate network cards first? from dmesg: NET: Registered protocol family 17 tg3: eth1: Link is up at 100 Mbps, full duplex. tg3: eth1: Flow control is on for TX and on for RX. NET: Registered protocol family 10 linux:~ # cat /sys/devices/pci0000:00/0000:00:02.0/0000:02:03.1/net/eth1/carrier cat: /sys/devices/pci0000:00/0000:00:02.0/0000:02:03.1/net/eth1/carrier: Invalid argument output came from adalid. This seems to be normal as long the interface was not brought in up state:
gw:/usr/src/linux # ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:08:54:53:FD:03
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:221 Base address:0xa000
gw:/usr/src/linux # cat /sys/class/net/eth0/carrier
cat: /sys/class/net/eth0/carrier: Invalid argument
gw:/usr/src/linux # ifconfig eth0 up
gw:/usr/src/linux # cat /sys/class/net/eth0/carrier
0
ifconfig eth0 down
gw:/usr/src/linux # cat /sys/class/net/eth0/carrier
cat: /sys/class/net/eth0/carrier: Invalid argument
> This seems to be normal as long the interface was not brought in up state
Right, carrier is always zero on all machines if the interface is down, so I expect they try to bring the network interface up before, maybe increasing a waiting time to let the network driver settle down a bit, works?
On adalid this test script shows that it takes rather long until the link
is detected (more than 1 sec, under high load possibly longer):
#!/bin/bash
for ((x=0;x<5;x++)); do
ifconfig eth1 down;
ifconfig eth1 up;
sleep $x
cat /sys/devices/pci0000:00/0000:00:02.0/0000:02:03.1/net/eth1/carrier;
done
/tmp/network_test.sh
0
0
1
1
1
Can the timeout in yast to let the network settle down and to check the link be increased, pls.
Hmm, I mean how long should Yast wait...
IMO this should still be solved properly in the kernel.
IMO the sysfs file access on carrier should block if there is a detection in progress, something like:
1. I do not know anything about network layer
2. I didn't find and search for the real carrier sysfs method
But could something like this be the real solution?:
static ssize_t carrier_show(struct class *cls, char *buf)
{
unsigned long timeout = jiffies + HZ * 5; /* 5s */
while (netif_carrier_check(cxy->dev)) &&
timeout > jiffies) {
/* I found netif_carrier_ok, netif_carrier_on and
netif_carrier_off...
NIC link detection in progress...
/*
cond_resched();
}
return sprintf(buf, "%d", netif_carrier_ok(xy->dev));
}
Two problems I am not sure:
- Is this allowed in a sysfs read at all, Kay?
- The netif_carrier_check(xy->dev) is probably hard to impelement?
Maybe it could be done for the tg3 only for now?
Just an idea, but solving/workarounding this in Yast is probably really ugly:
Michal just confirmed: Waiting longer would block the whole application, not a real solution.
(In reply to comment #20 from Thomas Renninger) > while (netif_carrier_check(cxy->dev)) && > timeout > jiffies) { Does jiffies magic really work with 'nohz' any more (since 'jiffies' might not be updated)? No that would be a bad idea. The driver cannot make a difference between "no connection" and "carrier detection in progress", so it would wait forever, if no cable is connected. And the testloop from comment #19 is wrong, it restarts carrier detection in every loop, note "ifconfig down" is the same like pulling the cable. The important things are: YaST should not access /sys/class/net/ethX/carrier before ifconfig up was done, this would cause an error, note on some devices a ifconfig up does not happen immediately in the driver, it maybe delayed until the driver thread is running again. You can examine /sys/class/net/ethX/flags, Bit0 shows up/down status. If it read 0 for carrier it should retry, at least 3 seconds, but some cards (and switches) maybe need more time. One idea would be to do ifconfig up on all found interfaces early as possible, do something else and then test carrier state. That does not work. Same script, extended to read flags and carrier: /tmp/network_test.sh Waiting for 0 seconds... carrirer: 0 flags: 0x1003 Waiting for 1 seconds... carrirer: 0 flags: 0x1003 Waiting for 2 seconds... carrirer: 1 flags: 0x1003 Waiting for 3 seconds... carrirer: 1 flags: 0x1003 Waiting for 4 seconds... carrirer: 1 flags: 0x1003 > You can examine /sys/class/net/ethX/flags, Bit0 shows up/down status
One second.., It's says connected :)
cat /sys/class/net/eth1/flags
0x1003
linux:~ # cat /sys/class/net/eth0/flags
0x1002
So they should use /sys/class/net/ethX/flags instead of carrier?
No. /sys/class/net/ethX/flags:0 only give the status up/down of the interface, you could check this to verify that a ifconfig up was given and executed by the driver. Detection of carrier takes time after ifconfig up, some cards/switches are quick (<2 sec) some need > 10 sec, you cannot do anything against that. The issue is, that newer HW does power down the PHY interface until they got enabled with ifconfig up, some other devices do enable the PHY interface with driver load (I think these are the "quick" ones), but this is not acceptable as a general solution because of powersave. Some workaround ideas:
1) ifup all the network interfaces by another program ealier on 2nd stage
install/setup boot.
Is not nice, because the yast lan module will still be broken stand alone
2) Wait the same amount as currently if one or more carrier files show 1
-> link detected. It is then expect that:
a) We have something to use for installation -> not that bad if another
link is not detected. Also cards of the same type should be ready.
b) If no link is detected at all, at least wait 5 secs (should not be
that often).
3) Re-evaluate carrier link after Yast lan is fully started and adjust
things to the user
-> Probably very hard to implement in Yast?
Best would be 2+3, this would be fully satisfactoring, but 3 may not be possible as Yast could make assumptions displaying things on the result of the detection?
I am off from discussion. I do not know enough in this area..., just some ideas.
But it seems, beside that our auto-installation does not work on several systems, we hit a sever bug here (especially Karsten's power saving assumptions make me nervous, that would mean that link detection time could take even longer in cards upcoming in the future if they take more care about power consumption?)
It came out that Yast and everything works rather well... The additional network autoyast conf: <keep_install_network config:type="boolean">true</keep_install_network> seem to do the trick. Thanks a lot to Michal Zugec, tracking it down and to Uweg Gansert pointing to the above param. Is this documented in some prominent place in the autoyast documentation? |