Bug 434742

Summary: suspend to ram failures with docking
Product: [openSUSE] openSUSE 11.1 Reporter: Andreas Jaeger <aj>
Component: Mobile DevicesAssignee: Rafael Wysocki <rjw>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: aj, grmela, vlewin
Version: Factory   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 357354    
Attachments: dmesg output after undocking
dmesg from 2.6.36-rc7 on openSUSE 11.3

Description Andreas Jaeger 2008-10-13 12:14:21 UTC
With both 11.1 Beta2 kernel and 11.0 kernels (i586-pae) on my Lenovo Thinkpad X61s I often (not sure whether always) notice the following:

* start machine without docking station
* suspend/resume (always s2ram) several times
* dock the machine
* undock
* suspend again
* resume does not work, I have a black screen
Comment 1 Holger Macht 2008-10-15 10:52:07 UTC
Will try to reproduce on a X60.
Comment 2 Holger Macht 2008-10-27 11:02:26 UTC
Ok, reproducible here on a ThinkPad X60. This does not seem to fit into the docking area, it rather sounds like a general suspend problem

Pavel, I remember you also have such a machine...so it would be a lot easier to debug this for you.
Comment 3 Juergen Weigert 2008-12-01 20:09:10 UTC
I have two variants of this scenario here on my X60s

a)
- boot in docking station. 
- undock (press button, wait, pull lever),
- suspend,
- resume hangs with black screen.

b) 
- boot in docking station.
- press undock button, wait
- suspend
- pull undock lever
- resume hangs with black screen.


In both cases the resume works well, if I do 'insert into docking station' just before the resume.
 
While resume hangs, reinsertion in docking station triggers immediate 
reboot in both cases.

See also bug#450175
Comment 4 Jan Grmela 2009-01-04 13:27:27 UTC
This bug is also reproducible on my X61, exactly the same behaviour as in comment #3. It may be related to the docking station since when I suspend the machine inside the dock and resume it without undocking, it resumes normally. My ThinkPad als suffers from the the "immediate restart problem".

Kernel: Linux 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64, the whole system is up-to-date.
Comment 5 Andreas Jaeger 2009-01-14 10:57:48 UTC
Pavel, any update on this bug?
Comment 6 Juergen Weigert 2009-01-14 13:22:36 UTC
Same issues seen with SLED-11-RC1, too.
In general:
One can either use a docking station or on can use suspend and resume.
Mixing both crashes.
Comment 7 Pavel Machek 2009-01-16 15:09:42 UTC
I do have x60 and docking station near me, but currently it does not suspend/resume at all (2.6.29-rc1 broke it). I was hoping to look at it "tommorow" for 14 days now.

Juergen: can you try to do suspend/resume in minimal system? (init=/bin/bash).
Comment 8 Andreas Jaeger 2009-01-16 16:07:09 UTC
I consider this as a crash - therefore Critical.

Pavel, please use the openSUSE 11.1 or the SLED 11 RC1 kernel for reproduction and testing.
Comment 10 Pavel Machek 2009-01-19 10:50:58 UTC
I already tested 2.6.28, and the fix does not seem to be there. I'd like to test 2.6.29 to see if it perhaps got fixed.

#8: I don't think that it is more critical than usual "suspend does not work with hardware XY" bugs, and those are traditionally normal. I guess it has higher priority because thinkpads are quite common... fortunately their docking stations are not _that_ common.

Did the suspend with docking stations work correctly on this hardware, ever?
Comment 11 Pavel Machek 2009-01-19 11:06:03 UTC
#3: Juergen: do you have any problems if you boot _outside_ the dock? You can then insert it/remove it/etc...
Comment 12 Pavel Machek 2009-01-19 11:15:33 UTC
Hmm, interesting:

boot in docking station
suspend/resume
undock
suspend/resume

works.

boot in docking station
undock
suspend/resume

breaks.
Comment 13 Andreas Jaeger 2009-01-19 11:38:32 UTC
Ad #10: The severity is Critical by definition of what critical is, you argue about priority!

It worked with my X40, not sure whether it worked with the x60
Comment 14 Pavel Machek 2009-01-19 11:53:34 UTC
Well, a	bit about both.

If I accept that non-waking computer is	critical, 90% of suspend problems will be of critical or higher severity... which is not useful.

So non-waking machine == normal/major ; machine that damages data during suspend == critical/blocker.

From my testing, it seems to work ok as long as I boot outside docking station. Can you confirm that?
Comment 15 Pavel Machek 2009-01-19 12:16:27 UTC
I see two different problems:

1) machine fails to sleep.. if you do echo mem > /sys/power/state manually (and use high console log level) you can see ide driver unhappy with hda not being there.... which is logical.

2) machine sleeps but nothing can wake it up. Do you see this one, too?
Comment 16 Andreas Jaeger 2009-01-19 14:08:46 UTC
Ad #14: I cannot confirm your findings in #14, see the initial report.
Ad #15: I don't see either of it ;(
Comment 19 Rafael Wysocki 2009-01-19 15:11:57 UTC
Can anyone please post the output of lspci with and without the dock from an affected machine?
Comment 20 Andreas Jaeger 2009-01-19 15:44:00 UTC
Docked:

00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)                                                                                          
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)                                                                         
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)                                                                                
00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)  
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)                                                                                          
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)                                                                                          
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HBM (ICH8M-E) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
05:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ba)
05:00.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 04)
05:00.2 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 21)
Comment 21 Andreas Jaeger 2009-01-19 15:45:33 UTC
lspci of this machine after undocking is the same - will reboot now undocked to see whether that makes a difference
Comment 22 Andreas Jaeger 2009-01-19 15:48:10 UTC
same lscpi output after booting without the dock.
Comment 23 Rafael Wysocki 2009-01-19 18:56:43 UTC
What devices are on the dock?
Comment 24 Holger Macht 2009-01-19 19:00:04 UTC
Just for the record: This worked before. From time to time it works, then it breaks again.
Comment 25 Pavel Machek 2009-01-19 20:45:07 UTC
Holger: When did it work? 

Rafael: dock for x60 contains USB hub, cdrom (PATA-AFAICT) and some connectors.

The cdrom is actually removable. Can you try pulling it, then retrying the tests? That will tell us if the ide part is responsible in 2.6.27 case, too.
Comment 26 Pavel Machek 2009-01-19 22:10:40 UTC
I did some quick tests with vanilla 2.6.27.7 I had easily available (I have slow linnk here, git update will take a while), and I could do #1 without problems. So it either depends on some suse-only patch, or it is config dependent.

More tommorow.
Comment 27 Holger Macht 2009-01-19 22:23:54 UTC
(In reply to comment #25)
> Holger: When did it work? 
Once in a while :-) Really, I don't have an exact date, I'm testing this for about 2 years now, and sometimes it works, sometimes not.

Andreas, please post your dmesg directly after undocking. Does /dev/sr0 still exist?
Comment 28 Pavel Machek 2009-01-19 22:46:44 UTC
...actually, other possibility is some userspace difference. Can you try that from init 1?
Comment 29 Andreas Jaeger 2009-01-20 08:48:52 UTC
Created attachment 266119 [details]
dmesg output after undocking

Ad #27: /dev/sr* does not exist anymore after undocking.
Comment 30 Holger Macht 2009-01-20 09:00:00 UTC
Hm, I fail to see where /dev/sr0 gets deregistered. Could you try to manually undock the ata_bay before undocking the docking station with:

cat /sys/devices/platform/dock.?/type

finding the dock device containing 'ata_bay' and then do

echo 1 > /sys/devices/platform/dock.x/undock

You should see something like 'ataX.00: disabled' in the logs.

Then undock the dock station and check if the problem persists. Thanks.
Comment 31 Andreas Jaeger 2009-01-20 09:02:17 UTC
Ad #25: A first test shows that removing the CDrom helped.

Jürgen, can you confirm this?

I'll test some more...
Comment 32 Andreas Jaeger 2009-01-20 09:07:49 UTC
Ad #30: The problem persists for me as far as I can see.

If you like to borrow my laptop for testing, I can give it out in Nuernberg for a couple of hours, just ask.
Comment 33 Holger Macht 2009-01-20 09:27:43 UTC
One last thing it would be good to confirm. If you do the same sequence as in the initial report, but with the cdrom removed from the dock station before you even boot up, still happening?
Comment 34 Pavel Machek 2009-01-20 10:56:02 UTC
#27: (hmm, cdrom was detected as hda here...)
Comment 35 Holger Macht 2009-01-20 10:59:35 UTC
Pavel, did you do a clean installation? Otherwise you need to make sure in initrd that ata_piix gets loaded before piix. Otherwise this cannot work, you need ata_piix/libata for hotplug support.
Comment 36 Andreas Jaeger 2009-01-20 12:55:50 UTC
Ad #33: That's what I tried and succeeded.

I'll continue to use the docking station now without the CDrom and once I get any problems will report here again...
Comment 37 Pavel Machek 2009-01-20 13:24:30 UTC
#35: yes, playing with the config gets me sr0 (and ata_piix, AFAICT).

Tejun, cdrom in docking station seems to be causing problems on suspend/resume. Can you help?
Comment 38 Holger Macht 2009-01-20 18:11:19 UTC
Ok, I think all this is of the same root cause:

https://bugzilla.novell.com/show_bug.cgi?id=441872 --> closing as dup.
http://bugzilla.kernel.org/show_bug.cgi?id=11703

For the latter bug in the kernel.org bugzilla, the reporter already provided a lot of debugging information. So I think we should continue there.

Pavel, Tejun I would definitely need some help there cause I don't know how to proceed/what to request next. Would be good if you hook into it.
Comment 39 Holger Macht 2009-01-20 18:12:04 UTC
*** Bug 441872 has been marked as a duplicate of this bug. ***
Comment 40 Tejun Heo 2009-01-21 12:59:10 UTC
Sorry about the delay.  Will follw up on kernel bugzilla 11703.
Comment 41 Pavel Machek 2009-01-26 15:36:51 UTC
Link for easy clicking: http://bugzilla.kernel.org/show_bug.cgi?id=11703 .
Comment 42 Pavel Machek 2009-02-09 09:33:58 UTC
Tejun, this seems to be triagged to block-level problem. Can you take care?
Comment 43 Tejun Heo 2009-02-10 05:09:24 UTC
Pavel, where is it triaged to block level problem?  Kernel bz#11703 still doesn't point to anything.  Am I missing something?
Comment 44 Pavel Machek 2009-02-10 11:50:49 UTC
See comment #14. (sorry its so burried.) If I boot inside the dock, then undock, hda (cdrom) is still "present" to linux -- but it is physically disconnected. When I now try to suspend, I get infinite loop in the ide driver; which is easy to see on dmesg. (no_console_suspend + high console loglevel are needed to see it).

Comment #31: seems to confirm that CDrom is problematic for aj, too... so it seems as this and vanilla problems I'm seeing have same underlying cause.

And yes, Holger noted in #35, I'm probably using the wrong modules; OTOH that should still not loop during suspend.
Comment 45 Tejun Heo 2009-02-10 14:44:59 UTC
Comment #14 is...

> Well, a    bit about both.
>
> If I accept that non-waking computer is    critical, 90% of suspend problems
> will be of critical or higher severity... which is not useful.
> 
> So non-waking machine == normal/major ; machine that damages data during
> suspend == critical/blocker.
> 
> From my testing, it seems to work ok as long as I boot outside docking station.
> Can you confirm that?

Anyways, support for Hotplug/suspend/resume in the original IDE driver is quite broken, so it failing in such scenario is expected.  The problem is that it doesn't work with libata and we seemingly still have no idea what is going on.  The kernel super-strangely just checks out after a schedule() from libata.  :-(

Thanks.
Comment 47 Rafael Wysocki 2010-10-13 22:09:46 UTC
Well, nothing's happened here for a long time.

Can you tell me please what the status of this bug is?

According to the reporter of http://bugzilla.kernel.org/show_bug.cgi?id=11703,
the problem had been fixed upstream before 2.6.31, so all of our recent kernels
should be fine in this respect.  Unless there is a regression.
Comment 48 Andreas Jaeger 2010-10-25 14:47:27 UTC
Still fails for me with 2.6.36-rc7. After undocking I get in dmesg:

Oct 25 16:45:02 x61s-aj kernel: [  177.850194] Suspending console(s) (use no_console_suspend to debug)
Oct 25 16:45:02 x61s-aj kernel: [  177.850732] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Oct 25 16:45:02 x61s-aj kernel: [  177.878080] sd 0:0:0:0: [sda] Stopping disk
Oct 25 16:45:02 x61s-aj kernel: [  177.896264] serial 00:0a: disable failed
Oct 25 16:45:02 x61s-aj kernel: [  177.896273] legacy_suspend(): pnp_bus_suspend+0x0/0xa0 returns -5
Oct 25 16:45:02 x61s-aj kernel: [  177.896276] PM: Device 00:0a failed to suspend: error -5
Oct 25 16:45:02 x61s-aj kernel: [  180.328426] PM: Some devices failed to suspend
Oct 25 16:45:02 x61s-aj kernel: [  180.329218] sd 0:0:0:0: [sda] Starting disk
Comment 49 Andreas Jaeger 2010-10-25 14:48:08 UTC
Created attachment 396836 [details]
dmesg from 2.6.36-rc7 on openSUSE 11.3
Comment 50 Rafael Wysocki 2010-10-25 22:41:14 UTC
Hmm.  The problem seems to be different now.  Apparently, a serial device
(ttyS0 if I'm not mistaken) refuses to suspend after undocking.

I guess the serial port in only present in the docking station, is that
correct?
Comment 51 Andreas Jaeger 2010-10-26 00:51:16 UTC
I assume it's only in the docking station.At least the docking station has a connector but the laptop not.
Comment 52 Rafael Wysocki 2011-01-10 20:52:33 UTC
I think this issue is related to
https://bugzilla.kernel.org/show_bug.cgi?id=15100
for which I have a workaround patch
https://patchwork.kernel.org/patch/469581/

Unfortunately, a major redesign seems to be necessary to fix the root cause
of this problem (which is that the PNP ACPI driver is unable to handle
dock/undock situations at all currently).
Comment 53 Rafael Wysocki 2011-01-27 18:06:18 UTC
The patch in comment #52 has been merged into the mainline kernel, so in
theory the failure may be fixed in the master kernel from:

http://ftp.suse.com/pub/projects/kernel/kotd/master/

(or it will be fixed in there shortly).

Can you check that, please?
Comment 54 Andreas Jaeger 2011-02-02 20:09:35 UTC
I'll check once I'm returning to my office - after the 15th of February. Thanks!
Comment 55 Andreas Jaeger 2011-02-14 09:24:41 UTC
FYI, it's not fixed in the openSUSE 2.6.37 kernel - I'll text the master (2.6.38) next.

If I suspend with 2.6.37 to RAM, it hangs in suspending - if I detach the laptop and thus remove the cdrom.
Comment 56 Andreas Jaeger 2011-02-14 13:57:03 UTC
Master kernel works fine with undocking, suspend to ram, resume, docking.

Could you submit the patch to the 11.4 branch as well, please?
Comment 57 Rafael Wysocki 2011-02-17 11:02:14 UTC
Submitted.