Bug 159136 - Broadcom NIC (tg3): losing carrier detect on suspend cycle
Summary: Broadcom NIC (tg3): losing carrier detect on suspend cycle
Status: RESOLVED FIXED
: 160507 178280 183225 195102 222820 (view as bug list)
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: Other Other
: P5 - None : Critical with 5 votes (vote)
Target Milestone: ---
Assignee: Karsten Keil
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 178280
  Show dependency treegraph
 
Reported: 2006-03-17 15:55 UTC by Danny Al-Gaaf
Modified: 2007-03-12 12:48 UTC (History)
7 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
log from /var/log/NetworkManager after suspend2ram (910.70 KB, text/plain)
2006-03-17 17:42 UTC, Danny Al-Gaaf
Details
output of lspci -vn (4.95 KB, text/plain)
2006-03-25 17:32 UTC, Danny Al-Gaaf
Details
dmesg of STR cycle with msglvl 0xffff (6.71 KB, text/plain)
2006-03-29 11:44 UTC, Timo Hoenig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Danny Al-Gaaf 2006-03-17 15:55:52 UTC
While testing suspend-to-ram on GNOME and KDE on the FSC Stylistic ST502x I registered that after resume there is no network connection. No hostname, no IP, no NIS/automount ... suspend-to-disk is not effected.

Not complete sure what the cause of this bug is, but IMO the networkmanager is a good starting point. If I use traditional ifup network all work perfect.

@thoenig: could you check this bug and if necessary reassign to the correct person (rlove?).
Comment 1 Timo Hoenig 2006-03-17 16:03:11 UTC
I've seen something similar with my Dell D600 having no network connection after resume.  As Powersave sends the sleep/wake signals it might turn out not to be a NM problem.  I'll take care.
Comment 2 Timo Hoenig 2006-03-17 16:10:13 UTC
By thw way, I've seen this with STD rather than STR which unfortunately does not work on my system any longer.  Hi, Seife ;-)
Comment 3 Timo Hoenig 2006-03-17 16:42:58 UTC
Could not reproduce a second time.

Danny, please attach /var/log/NetworkManager.
Comment 4 Robert Love 2006-03-17 16:45:30 UTC
There is a possible fix for this in the latest NM builds.  So if you have only reproduced in the past, please retry with current build from STABLE!
Comment 6 Danny Al-Gaaf 2006-03-17 17:42:52 UTC
Created attachment 73732 [details]
log from /var/log/NetworkManager after suspend2ram
Comment 7 Danny Al-Gaaf 2006-03-17 17:51:21 UTC
(In reply to comment #4)
> There is a possible fix for this in the latest NM builds.  So if you have only
> reproduced in the past, please retry with current build from STABLE!

I tried this with 0.6.1-5 and the problem is still present.
Comment 8 Timo Hoenig 2006-03-17 18:52:21 UTC
Danny's system (Broadcom NIC, tg3) is loosing the link beat.  Please follow Seife's instructions from IRC and -- if applicable -- assign to the kernel guys.
Comment 10 Timo Hoenig 2006-03-17 18:56:16 UTC
Whoops, I didn't mean to change the priority.  Reverting.
Comment 11 Danny Al-Gaaf 2006-03-21 16:08:03 UTC
Yes this look like a kernel bug in the tg3 module after s2ram. This happens also with init=/bin/bash as I could see on the machine. Reassign to kernel.

Btw. I think this is a blocker bug, but I set this to critical. @aj could you check the severity?
Comment 12 Andreas Jaeger 2006-03-21 19:05:03 UTC
this is not a blocker.
Comment 13 Olaf Kirch 2006-03-24 11:38:36 UTC
Comment on attachment 73732 [details]
log from /var/log/NetworkManager after suspend2ram

text/plain is text/plain is text/plain is text/plain and NOT application/octet-stream :-)
Comment 14 Olaf Kirch 2006-03-24 11:42:43 UTC
Please set the NIC's debug level to 65535 shortly before suspending
(using ethtool -s eth0 msglvl 65535), then suspend, resume and attach
dmesg output to this bug. Thanks!

Karsten, would you take this one, please?
Comment 15 Olaf Kirch 2006-03-24 14:42:25 UTC
Karsten, please look at today's thread on netdev with
subject "tg3 breakage this morning" - it seems one of the
power mgmt related patches that went into tg3 recently was bad.
Maybe it would help to back out one or more of these:

> [TG3]: Bump driver version and reldate.
> [TG3]: Skip phy power down on some devices
> [TG3]: Fix SRAM access during tg3_init_one()
> [TG3]: Don't mark tg3_test_registers() as returning ...
> [TG3]: make drivers/net/tg3.c:tg3_request_irq() static
> [TG3]: netif_carrier_off runs too early; could still ..

Specifically one poster mentioned that the "Skip phy power down" patch
was negatively affecting his machine.
Comment 16 Danny Al-Gaaf 2006-03-24 15:55:31 UTC
(In reply to comment #13)
> (From update of attachment 73732 [details] [edit])
> text/plain is text/plain is text/plain is text/plain and NOT
> application/octet-stream :-)

File a bug against bugzilla automatic detection if this is a problem for you
Comment 17 Robert Love 2006-03-24 15:59:07 UTC
*** Bug 160507 has been marked as a duplicate of this bug. ***
Comment 18 Karsten Keil 2006-03-24 16:25:00 UTC
Which kernel on which arch do you use (e.g. kernel-default on i386) ?
Comment 19 Karsten Keil 2006-03-24 19:10:03 UTC
OK some other basic informations about the hardware are missing, at least
lspci -vn for the networkcontroller.

regarding comment #15: 
These patches are not in our kernel yet, they were for 2.6.17. But maybe (depend on the exact HW) [TG3]: Fix SRAM access during tg3_init_one()
maybe a candidate to fix this issue.
Comment 20 Danny Al-Gaaf 2006-03-25 17:32:03 UTC
Created attachment 75017 [details]
output of lspci -vn
Comment 21 Karsten Keil 2006-03-27 08:40:02 UTC
please answer also comment 18.
Comment 22 Kirk Penrose 2006-03-27 20:52:49 UTC
Also reporting this issue is beta customer Luke Watson (SR 10254046167).
Comment 23 Timo Hoenig 2006-03-29 11:23:59 UTC
Danny?

Anyway, I've got another system with tg3 suffering from this bug.

Comment #18: kernel-default-2.6.16-7

Comment #19:

06:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet (rev 03)

06:06.0 Class 0200: 14e4:169c (rev 03)
  Subsystem: 1025:0067
  Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 177
  Memory at b0100000 (32-bit, non-prefetchable) [size=64K]
  Capabilities: [48] Power Management version 2
  Capabilities: [50] Vital Product Data
  Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
Comment 24 Timo Hoenig 2006-03-29 11:44:31 UTC
Created attachment 75529 [details]
dmesg of STR cycle with msglvl 0xffff
Comment 25 Timo Hoenig 2006-03-29 11:47:49 UTC
Adjusting summary.
Comment 26 Christoph Thiel 2006-04-20 20:19:22 UTC
There hasn't been any progress on this bug for a couple of weeks -> what's the status?
Comment 27 Kirk Penrose 2006-05-15 22:34:30 UTC
Beta Customer Dan Elder also having trouble.  Adding his comments. Please advise if this does not appear to be a duplicate of this bug. Dan can collect logs or any other info needed:

With rc1 when I resumed from a suspend on an x86_64 laptop using ndiswrapper for a Broadcom card, NetworkManager correctly identified which network to automatically join and listed other available networks in nm-applet but would not automatically join the correct network (GOAWAY).  nm-applet showed that it was in stage 1 of joining (the 3 bar progress bar had no bars filled in) and said that it was waiting for the wireless key for the network.  There wasn't any dialog box present asking for the key though and it already had they key from previous connections.  After waiting several minutes I cliked on nm-applet and manually chose the network to join from the list of available networks and it successfully joined up without any other intervention.
Comment 28 Forgotten User ZhJd0F0L3x 2006-05-16 05:40:45 UTC
(In reply to comment #27)
> Beta Customer Dan Elder also having trouble.  Adding his comments. Please
> advise if this does not appear to be a duplicate of this bug. Dan can collect
> logs or any other info needed:
> 
> With rc1 when I resumed from a suspend on an x86_64 laptop using ndiswrapper
> for a Broadcom card, NetworkManager correctly identified which network to

This is a different bug, since tg3 is for wired networks. Please file a bug for the NM/ndiswrapper/suspend problem, it might be a problem in the interaction of powersaved / module unloading / NM.
Comment 29 Forgotten User ZhJd0F0L3x 2006-05-24 06:20:55 UTC
this is also reported on the suspend-devel list and other mailing lists.
This will also hurt us in SLED10.

Will we do anything about it?
Comment 30 Forgotten User ZhJd0F0L3x 2006-05-24 06:23:06 UTC
argh! my bad, nobody can read it if it is SLED :-(
Comment 31 Marcel Hilzinger 2006-05-24 11:30:18 UTC
I also have a system affected from this bug. Shuttle XPC SD11G5 with Broadcom BCM5789 Gigabit Ethernet controller. Suspend-to-Disk and Standby works, but with suspend-to-ram, there is no network connection after resume.

Write me, if you need some logfiles
Comment 32 Greg Kroah-Hartman 2006-05-24 18:46:27 UTC
*** Bug 178280 has been marked as a duplicate of this bug. ***
Comment 33 Greg Kroah-Hartman 2006-05-24 18:52:03 UTC
Is there anyone on this bug that can reproduce the problem that has the ability
to test kernel patches out to see if we can fix the issue or not?
Comment 34 Forgotten User ZhJd0F0L3x 2006-05-25 09:31:06 UTC
at least Danny has a D600 that shows this problem. He will probably not be back in the office before Monday.
Comment 36 Marcel Hilzinger 2006-05-26 10:16:15 UTC
To #33: Feel free to contact me. My testmachine stays for another week.
Comment 37 Greg Kroah-Hartman 2006-06-01 03:59:47 UTC
suspend to ram isn't "critical"
Comment 39 Karsten Keil 2006-06-01 10:22:55 UTC
Will have a look at this.
Comment 40 Karsten Keil 2006-06-01 21:30:24 UTC
I have build 2 test kernel, please try these,they are available (after sync) from: ftp://ftp.suse.com/pub/people/kkeil/testing/code10/[i586|x86_64]

kernel-[default|smp]-2.6.16.18-3.<arch>.rpm          with a resume patch
kernel-[default|smp]-2.6.16.18-tg3_3.58.<arch>.rpm   version 3.58 from 2.6.17

Comment 41 Dennis Sieben 2006-06-07 09:33:48 UTC
As I have the same problem here on an Acer TravelMate C300 I tested the two kernel from above, and they don't change anything, at least here with my problem. After the suspend you aren't able to activate the network card. Even rmmod/modprobe of the tg3 driver doesn't help. This problem occurs here with the final version of SL 10.1 not Beta 8 as stated in the header. This was working fine on my machine with SL 10.0, and it only affects STR not STD as said by Danny in the description.
Comment 42 Timo Hoenig 2006-06-09 18:24:09 UTC
*** Bug 183225 has been marked as a duplicate of this bug. ***
Comment 43 Patrick Smart 2006-06-11 17:23:35 UTC
In my case, I don't have a network connection after a suspend-to-disk. Should this be a new bug or is it in the scope of this one? My network card is on an nForce (1st generation).
Comment 44 Timo Hoenig 2006-06-11 18:06:08 UTC
Patrick, please open a new bug for that issue.  Please add me to CC of the new bug as I would have some questions for investigating your problem.
Comment 45 Patrick Smart 2006-06-13 21:03:41 UTC
Timo, I opened bug 184660.
Comment 46 Robert Love 2006-08-07 13:48:07 UTC
*** Bug 195102 has been marked as a duplicate of this bug. ***
Comment 47 Holger Macht 2006-11-24 14:46:36 UTC
Seems to be fixed with 10.2 RC1 on my system here. Maybe Danny can verify on his DELL...
Comment 49 Holger Macht 2006-12-06 13:58:58 UTC
*** Bug 222820 has been marked as a duplicate of this bug. ***
Comment 50 Karsten Keil 2007-02-22 18:32:04 UTC
Can you please retest with a current SP1 beta kernel, it contains the lastest driver from broadcom which has some changes in this area.
Comment 51 Forgotten User CxVz4LpaB5 2007-03-06 19:49:19 UTC
I don't have access the the laptop where the broadcom nic was installed anymore so I can't really help you with the test on this bug, sorry.
Comment 52 Danny Al-Gaaf 2007-03-12 12:28:41 UTC
Tested with actual SP1 kernel from next-sle10-sp-i386 and it work now for SP1.
Comment 53 Karsten Keil 2007-03-12 12:48:18 UTC
So it's fixed for future releases according last comment.