|
Bugzilla – Full Text Bug Listing |
| Summary: | suspend to ram failures with docking | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.1 | Reporter: | Andreas Jaeger <aj> |
| Component: | Mobile Devices | Assignee: | Rafael Wysocki <rjw> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P3 - Medium | CC: | aj, grmela, vlewin |
| Version: | Factory | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Development | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | |||
| Bug Blocks: | 357354 | ||
| Attachments: |
dmesg output after undocking
dmesg from 2.6.36-rc7 on openSUSE 11.3 |
||
|
Description
Andreas Jaeger
2008-10-13 12:14:21 UTC
Will try to reproduce on a X60. Ok, reproducible here on a ThinkPad X60. This does not seem to fit into the docking area, it rather sounds like a general suspend problem Pavel, I remember you also have such a machine...so it would be a lot easier to debug this for you. I have two variants of this scenario here on my X60s a) - boot in docking station. - undock (press button, wait, pull lever), - suspend, - resume hangs with black screen. b) - boot in docking station. - press undock button, wait - suspend - pull undock lever - resume hangs with black screen. In both cases the resume works well, if I do 'insert into docking station' just before the resume. While resume hangs, reinsertion in docking station triggers immediate reboot in both cases. See also bug#450175 This bug is also reproducible on my X61, exactly the same behaviour as in comment #3. It may be related to the docking station since when I suspend the machine inside the dock and resume it without undocking, it resumes normally. My ThinkPad als suffers from the the "immediate restart problem". Kernel: Linux 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64, the whole system is up-to-date. Pavel, any update on this bug? Same issues seen with SLED-11-RC1, too. In general: One can either use a docking station or on can use suspend and resume. Mixing both crashes. I do have x60 and docking station near me, but currently it does not suspend/resume at all (2.6.29-rc1 broke it). I was hoping to look at it "tommorow" for 14 days now. Juergen: can you try to do suspend/resume in minimal system? (init=/bin/bash). I consider this as a crash - therefore Critical. Pavel, please use the openSUSE 11.1 or the SLED 11 RC1 kernel for reproduction and testing. I already tested 2.6.28, and the fix does not seem to be there. I'd like to test 2.6.29 to see if it perhaps got fixed. #8: I don't think that it is more critical than usual "suspend does not work with hardware XY" bugs, and those are traditionally normal. I guess it has higher priority because thinkpads are quite common... fortunately their docking stations are not _that_ common. Did the suspend with docking stations work correctly on this hardware, ever? #3: Juergen: do you have any problems if you boot _outside_ the dock? You can then insert it/remove it/etc... Hmm, interesting: boot in docking station suspend/resume undock suspend/resume works. boot in docking station undock suspend/resume breaks. Ad #10: The severity is Critical by definition of what critical is, you argue about priority! It worked with my X40, not sure whether it worked with the x60 Well, a bit about both. If I accept that non-waking computer is critical, 90% of suspend problems will be of critical or higher severity... which is not useful. So non-waking machine == normal/major ; machine that damages data during suspend == critical/blocker. From my testing, it seems to work ok as long as I boot outside docking station. Can you confirm that? I see two different problems: 1) machine fails to sleep.. if you do echo mem > /sys/power/state manually (and use high console log level) you can see ide driver unhappy with hda not being there.... which is logical. 2) machine sleeps but nothing can wake it up. Do you see this one, too? Ad #14: I cannot confirm your findings in #14, see the initial report. Ad #15: I don't see either of it ;( Can anyone please post the output of lspci with and without the dock from an affected machine? Docked: 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c) 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) 00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev03) 00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3) 00:1f.0 ISA bridge: Intel Corporation 82801HBM (ICH8M-E) LPC Interface Controller (rev 03) 00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03) 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02) 05:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ba) 05:00.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 04) 05:00.2 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 21) lspci of this machine after undocking is the same - will reboot now undocked to see whether that makes a difference same lscpi output after booting without the dock. What devices are on the dock? Just for the record: This worked before. From time to time it works, then it breaks again. Holger: When did it work? Rafael: dock for x60 contains USB hub, cdrom (PATA-AFAICT) and some connectors. The cdrom is actually removable. Can you try pulling it, then retrying the tests? That will tell us if the ide part is responsible in 2.6.27 case, too. I did some quick tests with vanilla 2.6.27.7 I had easily available (I have slow linnk here, git update will take a while), and I could do #1 without problems. So it either depends on some suse-only patch, or it is config dependent. More tommorow. (In reply to comment #25) > Holger: When did it work? Once in a while :-) Really, I don't have an exact date, I'm testing this for about 2 years now, and sometimes it works, sometimes not. Andreas, please post your dmesg directly after undocking. Does /dev/sr0 still exist? ...actually, other possibility is some userspace difference. Can you try that from init 1? Created attachment 266119 [details]
dmesg output after undocking
Ad #27: /dev/sr* does not exist anymore after undocking.
Hm, I fail to see where /dev/sr0 gets deregistered. Could you try to manually undock the ata_bay before undocking the docking station with: cat /sys/devices/platform/dock.?/type finding the dock device containing 'ata_bay' and then do echo 1 > /sys/devices/platform/dock.x/undock You should see something like 'ataX.00: disabled' in the logs. Then undock the dock station and check if the problem persists. Thanks. Ad #25: A first test shows that removing the CDrom helped. Jürgen, can you confirm this? I'll test some more... Ad #30: The problem persists for me as far as I can see. If you like to borrow my laptop for testing, I can give it out in Nuernberg for a couple of hours, just ask. One last thing it would be good to confirm. If you do the same sequence as in the initial report, but with the cdrom removed from the dock station before you even boot up, still happening? #27: (hmm, cdrom was detected as hda here...) Pavel, did you do a clean installation? Otherwise you need to make sure in initrd that ata_piix gets loaded before piix. Otherwise this cannot work, you need ata_piix/libata for hotplug support. Ad #33: That's what I tried and succeeded. I'll continue to use the docking station now without the CDrom and once I get any problems will report here again... #35: yes, playing with the config gets me sr0 (and ata_piix, AFAICT). Tejun, cdrom in docking station seems to be causing problems on suspend/resume. Can you help? Ok, I think all this is of the same root cause: https://bugzilla.novell.com/show_bug.cgi?id=441872 --> closing as dup. http://bugzilla.kernel.org/show_bug.cgi?id=11703 For the latter bug in the kernel.org bugzilla, the reporter already provided a lot of debugging information. So I think we should continue there. Pavel, Tejun I would definitely need some help there cause I don't know how to proceed/what to request next. Would be good if you hook into it. *** Bug 441872 has been marked as a duplicate of this bug. *** Sorry about the delay. Will follw up on kernel bugzilla 11703. Link for easy clicking: http://bugzilla.kernel.org/show_bug.cgi?id=11703 . Tejun, this seems to be triagged to block-level problem. Can you take care? Pavel, where is it triaged to block level problem? Kernel bz#11703 still doesn't point to anything. Am I missing something? See comment #14. (sorry its so burried.) If I boot inside the dock, then undock, hda (cdrom) is still "present" to linux -- but it is physically disconnected. When I now try to suspend, I get infinite loop in the ide driver; which is easy to see on dmesg. (no_console_suspend + high console loglevel are needed to see it). Comment #31: seems to confirm that CDrom is problematic for aj, too... so it seems as this and vanilla problems I'm seeing have same underlying cause. And yes, Holger noted in #35, I'm probably using the wrong modules; OTOH that should still not loop during suspend. Comment #14 is... > Well, a bit about both. > > If I accept that non-waking computer is critical, 90% of suspend problems > will be of critical or higher severity... which is not useful. > > So non-waking machine == normal/major ; machine that damages data during > suspend == critical/blocker. > > From my testing, it seems to work ok as long as I boot outside docking station. > Can you confirm that? Anyways, support for Hotplug/suspend/resume in the original IDE driver is quite broken, so it failing in such scenario is expected. The problem is that it doesn't work with libata and we seemingly still have no idea what is going on. The kernel super-strangely just checks out after a schedule() from libata. :-( Thanks. Well, nothing's happened here for a long time. Can you tell me please what the status of this bug is? According to the reporter of http://bugzilla.kernel.org/show_bug.cgi?id=11703, the problem had been fixed upstream before 2.6.31, so all of our recent kernels should be fine in this respect. Unless there is a regression. Still fails for me with 2.6.36-rc7. After undocking I get in dmesg: Oct 25 16:45:02 x61s-aj kernel: [ 177.850194] Suspending console(s) (use no_console_suspend to debug) Oct 25 16:45:02 x61s-aj kernel: [ 177.850732] sd 0:0:0:0: [sda] Synchronizing SCSI cache Oct 25 16:45:02 x61s-aj kernel: [ 177.878080] sd 0:0:0:0: [sda] Stopping disk Oct 25 16:45:02 x61s-aj kernel: [ 177.896264] serial 00:0a: disable failed Oct 25 16:45:02 x61s-aj kernel: [ 177.896273] legacy_suspend(): pnp_bus_suspend+0x0/0xa0 returns -5 Oct 25 16:45:02 x61s-aj kernel: [ 177.896276] PM: Device 00:0a failed to suspend: error -5 Oct 25 16:45:02 x61s-aj kernel: [ 180.328426] PM: Some devices failed to suspend Oct 25 16:45:02 x61s-aj kernel: [ 180.329218] sd 0:0:0:0: [sda] Starting disk Created attachment 396836 [details]
dmesg from 2.6.36-rc7 on openSUSE 11.3
Hmm. The problem seems to be different now. Apparently, a serial device (ttyS0 if I'm not mistaken) refuses to suspend after undocking. I guess the serial port in only present in the docking station, is that correct? I assume it's only in the docking station.At least the docking station has a connector but the laptop not. I think this issue is related to https://bugzilla.kernel.org/show_bug.cgi?id=15100 for which I have a workaround patch https://patchwork.kernel.org/patch/469581/ Unfortunately, a major redesign seems to be necessary to fix the root cause of this problem (which is that the PNP ACPI driver is unable to handle dock/undock situations at all currently). The patch in comment #52 has been merged into the mainline kernel, so in theory the failure may be fixed in the master kernel from: http://ftp.suse.com/pub/projects/kernel/kotd/master/ (or it will be fixed in there shortly). Can you check that, please? I'll check once I'm returning to my office - after the 15th of February. Thanks! FYI, it's not fixed in the openSUSE 2.6.37 kernel - I'll text the master (2.6.38) next. If I suspend with 2.6.37 to RAM, it hangs in suspending - if I detach the laptop and thus remove the cdrom. Master kernel works fine with undocking, suspend to ram, resume, docking. Could you submit the patch to the 11.4 branch as well, please? Submitted. |