Bugzilla – Bug 702205
RTL8111/8168B hard locking and rebooting machines when under heavy load
Last modified: 2012-03-22 19:00:45 UTC
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 When this network card (should apply to all cards from this family) is under heavy load (>20MB) it causes hard lockups and/or hard machine reboots. This issue is extremely hard to localize. There are either no errors, or the machine reboots to fast. Plus the actual errors are not very informative ("eth0 link up" is one of the "fatal errors"). Upstream reference: https://bugzilla.kernel.org/show_bug.cgi?id=32962 Ubuntu forum reference: http://ubuntuforums.org/showpost.php?p=10774353&postcount=18 Solution is to use the driver provided by Realtek. I spent two days hunting this problem. Reproducible: Always Steps to Reproduce: 1. do some network heavy stuff 2. watch your machine crash Actual Results: Machine reboots or hard locks. Expected Results: No problems should appear. Please even if you won't fix this soon, at least disable the problematic module, so that users will have some clue where to search, because this is a really, really, really annoying problem that is extremely hard to diagnose.
Whoops, forgot, the other characteristic error is: "NOHZ local_softirq_pending 08"
I can reproduce this problem on several independent machines. My solution was to switch to a driver as distributed by www.realtek.com. Why is this bug not addressed?
I'm having the same problem on a HP Pavillion DV6 notebook with this chip onboard. OpenSuSE 11.4 is UNUSABLE on this machine in this state! Please fit this problem if possible!
Simon, Sven, Joschi: Many different rtl8118/8168 chip versions share the same pci ids and are driven by the r8169 module. These chip versions are distinguished by their so-called XID. Upon encountering an unknown xid, the r8169 driver tries one of a few fallbacks. When using older kernels, such as the one found in openSUSE 11.4, it is often the case that these unknown XIDs are for chip versions newer than the ones supported by the driver. These can lead to half-working devices like what you describe (I'm speaking from experience here ;). In particular, openSUSE 11.4 is running a 2.6.37 kernel and the support for three new chip versions was introduced in r8169 since: 01dc7fe net/r8169: support RTL8168E v3.0-rc1 7009042 r8169: support RTL8111E-VL. v3.1-rc1 c221892 r8169: support new chips of RTL8111F v3.2-rc1 I would recommend you to upgrade to openSUSE 12.1, running a 3.1 kernel. That will get you the support for the E and EVL chips. If you'd rather stay on 11.4 (why?), you can install the kernel package alone from 12.1: zypper ar obs://Kernel:openSUSE-12.1/standard kotd12.1 vi /etc/zypp/zypp.conf # uncomment "multiversion = provides:multiversion(kernel)" zypper dup -r kotd12.1 If you still experience issues with this network card after upgrading, please attach your dmesg output. It should contain a line like this which will help identifying which chip revision you are running: eth0: RTL8168evl/8111evl at 0xf9320000, 10:1f:74:ce:b0:17, XID 0c900800 IRQ 28 Let me know how things go, thank you.
*** Bug 709886 has been marked as a duplicate of this bug. ***
Compare: 1) In/against openSUSE 12.1: [opensuse] Install help for Network driver (Date: Wed, 14 Mar 2012 15:27:02 -0400) http://lists.opensuse.org/opensuse/2012-03/msg00676.html especially: http://lists.opensuse.org/opensuse/2012-03/msg00765.html 2) Still in/against openSUSE 11.4 http://forums.opensuse.org/deutsch-german/hilfe-und-helfen/netzwerk/473306-opensuse-11-4-x86_64-realtek-onboard-nic-schafft-keinen-link-nach-dem-booten.html (11-Mar-2012, German)
(In reply to comment #6) > Compare: > > 1) In/against openSUSE 12.1: > [opensuse] Install help for Network driver > (Date: Wed, 14 Mar 2012 15:27:02 -0400) > http://lists.opensuse.org/opensuse/2012-03/msg00676.html > especially: > http://lists.opensuse.org/opensuse/2012-03/msg00765.html There is some confusion in that thread: > As suspected your software system uses a r816*9* kernel module (...9) > but your hardware is a Realtek [...] RTL8111/816*8*B [10ec:816*8*] (...8). Despite it's name, the r8169 module is meant to drive cards based on the realtek 8168/8111 chips. The difference between the two modules is that: r8168 is a binary-only driver provided by realtek r8169 is a community-developped and supported driver While it is the case that r8168 usually supports newer chips first, the version of r8169 currently in openSUSE 12.1 supports all the chips I've seen in circulation so far. IMO, steering users towards r8168 is ill-advised as it will be extremely difficult to find some developpers willing and able to provide support for it. Secondly, lspci output is insufficient to determine the chip version as many of them share a small set of pci ids. A first step in identifying the chip version is the (masked) XID line found in the kernel logs as pointed out at the end of comment 4. The chip version is identified from the (unmasked) XID in rtl8169_get_mac_version() http://lxr.linux.no/#linux+v3.1.10/drivers/net/r8169.c#L1724
(In reply to comment #7) > The difference between the two modules is that: > r8168 is a binary-only driver provided by realtek correction: the source for r8168 is in fact available. But it is an out of tree driver, it is not supported by the kernel community and it is not supported by SUSE (afaik). Thank you Martin for pointing this out.
Since the release of openSUSE 12.1, openSUSE 11.4 is now getting important security and bug fixes only. From what I can tell, the problems reported here are related to support for new hardware on 11.4 so I'll go ahead and close this bug entry. I'm sorry we could not address the reporter's problem earlier but now 12.1 has been out for a few months and it has support for newer realtek chip versions. If you experience kernel crashes related to the in-kernel r8169 module on 12.1 please do open a new bugzilla entry (leave a comment here pointing to the new one if you wish) and make sure to include the XID line from dmesg in that new entry (as well as OOPS output or other relevant info).