Bugzilla – Bug 934397
Resume from suspend to ram fails when HDD is connected
Last modified: 2018-07-03 20:55:06 UTC
During resume from suspend, the display content reappears correctly but after a short pause (2-3 secs) the kernel panics, the keyboard is unresponsive with lights flashing. Reset/power cycle is the only way forward. After a long series of tests, I discovered that if I disconnect the extra drive WDC WD30EZRX-00M (3TB), the desktop can resume correctly every time! The additional drive is for storage only. It contains only a Samba share, but it was unmounted (but powered and connected). I persisted with the tests only because I've recently noticed that both Ubuntu and Debian that I installed (in their most recent versions) on a 3rd drive (that too is a WD Caviar Green) can resume from suspend without problems. Only the default graphic driver is installed: bigboy:~ # lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02) Event though I can't seem to have the correct debuginfo package installed to use crash, I was able to read in the dmesg saved by kdump (abstract): [ 165.133021] ata4: SATA link down (SStatus 0 SControl 300) [ 165.133038] ata3: SATA link down (SStatus 0 SControl 300) [ 165.135012] ata6: SATA link down (SStatus 0 SControl 300) [ 166.052486] ata7.00: configured for UDMA/33 [ 169.469022] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 169.470705] ata1.00: configured for UDMA/133 [ 170.182017] ata5: link is slow to respond, please be patient (ready=0) [ 170.183010] ata2: link is slow to respond, please be patient (ready=0) [ 174.824010] ata2: COMRESET failed (errno=-16) [ 174.874021] ata5: COMRESET failed (errno=-16) [ 175.129019] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 175.281767] ata2.00: configured for UDMA/133 Since then I tried to change cable and SATA port with no changes: if that drive is connected, resume fails. I would be happy to provide more info if necessary. Thanks for your help
Sounds like the dup of bug 913105. All are with WD harddisks. If it's the same bug, the bug was introduced somewhere in 3.13. To be sure, could you check whether the recent kernel still has the same problem? For example, try the 4.0.x kernel in OBS Kernel:stable repo.
Also, try the SLE12 kernel, found in OBS Kernel:SLE12 repo, too. It's 3.12.x base, so if the bug is as same as bug 913105, this kernel may survive.
I tried with kernel 4.0.5-1.gf4cd21b-desktop and the problem is still there. I do have a crash dump if it helps (I can only provide the files, not any skills to look at them). The dmesg from the crash says: [ 73.957016] ata3: SATA link down (SStatus 0 SControl 300) [ 73.961017] ata4: SATA link down (SStatus 0 SControl 300) [ 73.961030] ata6: SATA link down (SStatus 0 SControl 300) [ 74.114023] usb 5-1: reset low-speed USB device number 2 using uhci_hcd [ 74.880484] ata7.00: configured for UDMA/33 [ 75.748492] e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [ 78.244018] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 78.245701] ata1.00: configured for UDMA/133 [ 79.002184] ata5: link is slow to respond, please be patient (ready=0) [ 79.012012] ata2: link is slow to respond, please be patient (ready=0) [ 83.031024] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 83.184936] ata5.00: configured for UDMA/133 [ 83.704007] ata2: COMRESET failed (errno=-16) [ 85.539018] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 85.663005] sr 6:0:0:0: **** DPM device timeout **** [ 85.663014] ffff880124913c68 ffff880126430450 ffff88012490a5d0 ffffffff8168289e [ 85.663015] sd 0:0:0:0: **** DPM device timeout **** [ 85.663018] ffff880124913fd8 [ 85.663019] ffff8801260fbc68 [ 85.663020] 0000000000000010 [ 85.663021] ffff88012603e250 [ 85.663022] ffffffff814a6030 [ 85.663023] ffff8800cabea190 [ 85.663024] 0000000000000000 [ 85.663024] 000000000000f630 [ 85.663025] [ 85.663025] [ 85.663027] ffffffff81a8e160 [ 85.663028] ffff8801260fbfd8 [ 85.663029] ffff880124913c88 [ 85.663030] 0000000000000010 [ 85.663031] ffffffff8167f1c7 [ 85.663031] ffffffff814a6030 [ 85.663032] 0000000000000286 [ 85.663033] 0000000000000000 [ 85.663033] [ 85.663034] [ 85.663035] Call Trace: [ 85.663038] ffffffff81a8e160 ffff8801260fbc88 ffffffff8167f1c7 0000000000000286 [ 85.663039] Call Trace: [ 85.663051] [<ffffffff8167f1c7>] schedule+0x37/0x90 [ 85.663057] [<ffffffff8167f1c7>] schedule+0x37/0x90 [ 85.663061] [<ffffffff81083f15>] async_synchronize_cookie_domain+0x55/0x130 [ 85.663066] [<ffffffff81083f15>] async_synchronize_cookie_domain+0x55/0x130 [ 85.663070] [<ffffffff814a5fc4>] scsi_bus_resume_common+0xa4/0xd0 [ 85.663074] [<ffffffff814a5fc4>] scsi_bus_resume_common+0xa4/0xd0 [ 85.663078] [<ffffffff8147a89a>] dpm_run_callback+0x4a/0x150 [ 85.663081] [<ffffffff8147a89a>] dpm_run_callback+0x4a/0x150 [ 85.663084] [<ffffffff8147ae7b>] device_resume+0x10b/0x240 [ 85.663086] [<ffffffff8147ae7b>] device_resume+0x10b/0x240 [ 85.663088] [<ffffffff8147afc9>] async_resume+0x19/0x40 [ 85.663090] [<ffffffff8147afc9>] async_resume+0x19/0x40 [ 85.663092] [<ffffffff81083d13>] async_run_entry_fn+0x43/0x150 [ 85.663094] [<ffffffff81083d13>] async_run_entry_fn+0x43/0x150 [ 85.663098] [<ffffffff8107bcf2>] process_one_work+0x142/0x420 [ 85.663102] [<ffffffff8107bcf2>] process_one_work+0x142/0x420 [ 85.663104] [<ffffffff8107c0e4>] worker_thread+0x114/0x460 [ 85.663106] [<ffffffff8107c0e4>] worker_thread+0x114/0x460 [ 85.663108] [<ffffffff81081261>] kthread+0xc1/0xe0 [ 85.663111] [<ffffffff81081261>] kthread+0xc1/0xe0 [ 85.663114] [<ffffffff816830d8>] ret_from_fork+0x58/0x90 [ 85.663117] [<ffffffff816830d8>] ret_from_fork+0x58/0x90 [ 85.663118] Kernel panic - not syncing: sr 6:0:0:0: unrecoverable failure
I can confirm that kernel 3.12.43-15.g537dcf2-default from SLE12 does resume correctly when the drive is connected.
(In reply to Dario Savella from comment #3) > I tried with kernel 4.0.5-1.gf4cd21b-desktop and the problem is still there. > I do have a crash dump if it helps (I can only provide the files, not any > skills to look at them). > > The dmesg from the crash says: > > [ 73.957016] ata3: SATA link down (SStatus 0 SControl 300) > [ 73.961017] ata4: SATA link down (SStatus 0 SControl 300) > [ 73.961030] ata6: SATA link down (SStatus 0 SControl 300) > [ 74.114023] usb 5-1: reset low-speed USB device number 2 using uhci_hcd > [ 74.880484] ata7.00: configured for UDMA/33 > [ 75.748492] e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 78.244018] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 78.245701] ata1.00: configured for UDMA/133 > [ 79.002184] ata5: link is slow to respond, please be patient (ready=0) > [ 79.012012] ata2: link is slow to respond, please be patient (ready=0) > [ 83.031024] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 83.184936] ata5.00: configured for UDMA/133 > [ 83.704007] ata2: COMRESET failed (errno=-16) > [ 85.539018] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 85.663005] sr 6:0:0:0: **** DPM device timeout **** So this looks like the cause. The recent kernel has a watchdog for async resume workers, and if it expires, it panics. This explains why 3.12 worked; the watchdog was introduced since 3.13. The timeout length is unfortunately fixed in Kconfig, set to 12 as default. And this seems too short. We should extend this to at least a minute, I suppose. Also, it'd be better to be dynamically configuratble. The openSUSE-13.2 test kernel packages with the extended timeout to 60 seconds is being built on OBS home:tiwai:bnc934397 repo. Could you give it a try? It will take some time until the build finishes.
I will give it a try. While we wait for the build, could you help me to add the repository you mention ? I can add repositories, but now with the notation you are using.
(In reply to Dario Savella from comment #6) > I will give it a try. > While we wait for the build, could you help me to add the repository you > mention ? > I can add repositories, but now with the notation you are using. ...but NOT with...
(In reply to Dario Savella from comment #6) > I will give it a try. > While we wait for the build, could you help me to add the repository you > mention ? > I can add repositories, but now with the notation you are using. osc ar obs://home:/tiwai:/bnc934397/standard test-kernel The build seems already finished, but not published yet. Meanwhile you can get binaries directly via osc getbinaries home:tiwai:bnc934397/kernel-desktop/standard/x86_64
(In reply to Takashi Iwai from comment #8) > (In reply to Dario Savella from comment #6) > > I will give it a try. > > While we wait for the build, could you help me to add the repository you > > mention ? > > I can add repositories, but now with the notation you are using. > > osc ar obs://home:/tiwai:/bnc934397/standard test-kernel Sorry, it's zypper, instead of osc, of course. > The build seems already finished, but not published yet. Meanwhile you can > get binaries directly via > osc getbinaries home:tiwai:bnc934397/kernel-desktop/standard/x86_64 This is with osc. With directly using osc, you can download the unpublished packages, too.
... and now the project is published, can be downloaded from http://download.opensuse.org/repositories/home:/tiwai:/bnc934397/standard/
Good news I suppose. The new kernel resumes from suspend with the drive connected and/or mounted. Just be sure I installed the right thing: bigboy:~ # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.16.7-1.gf382e20-desktop root=UUID=b914b0a0-964b-4fa7-91c9-a0f8f0b57fcf quiet resume=/dev/sda1 splash=silent quiet showopts crashkernel=256M-:128M clocksource=tsc vga=792 Is there anything else you need from my end ? What's the release cycle for these fixes? I'm in no hurry, but I'd like to know when to stop worrying about resume.
(In reply to Dario Savella from comment #11) > Good news I suppose. The new kernel resumes from suspend with the drive > connected and/or mounted. > Just be sure I installed the right thing: > > bigboy:~ # cat /proc/cmdline > BOOT_IMAGE=/boot/vmlinuz-3.16.7-1.gf382e20-desktop > root=UUID=b914b0a0-964b-4fa7-91c9-a0f8f0b57fcf quiet resume=/dev/sda1 > splash=silent quiet showopts crashkernel=256M-:128M clocksource=tsc vga=792 > > Is there anything else you need from my end ? Could you give the kernel messages after resume with the new kernel? > What's the release cycle for these fixes? > I'm in no hurry, but I'd like to know when to stop worrying about resume. The change must be safe, so I can take it soon. But, the official update release may take some time for openSUSE 13.2, as usual.
Created attachment 637702 [details] journal during successful resume
Upstream informed, please see bug report https://bugzilla.kernel.org/show_bug.cgi?id=91921
Pushed: 0e899eb6113c..b5e86cc44ede stable^ -> stable
The fix has been merged to 13.2, stable and master branches. Let's close.
Sure. Thanks a lot for your help.
openSUSE-SU-2015:1382-1: An update that solves 21 vulnerabilities and has 8 fixes is now available. Category: security (important) Bug References: 907092,907714,915517,916225,919007,919596,921769,922583,925567,925961,927786,928693,929624,930488,930599,931580,932348,932844,933934,934202,934397,934755,935530,935542,935705,935913,937226,938976,939394 CVE References: CVE-2014-9728,CVE-2014-9729,CVE-2014-9730,CVE-2014-9731,CVE-2015-1420,CVE-2015-1465,CVE-2015-2041,CVE-2015-2922,CVE-2015-3212,CVE-2015-3290,CVE-2015-3339,CVE-2015-3636,CVE-2015-4001,CVE-2015-4002,CVE-2015-4003,CVE-2015-4036,CVE-2015-4167,CVE-2015-4692,CVE-2015-4700,CVE-2015-5364,CVE-2015-5366 Sources used: openSUSE 13.2 (src): bbswitch-0.8-3.11.1, cloop-2.639-14.11.1, crash-7.0.8-11.1, hdjmod-1.28-18.12.1, ipset-6.23-11.1, kernel-debug-3.16.7-24.1, kernel-default-3.16.7-24.1, kernel-desktop-3.16.7-24.1, kernel-docs-3.16.7-24.2, kernel-ec2-3.16.7-24.1, kernel-obs-build-3.16.7-24.2, kernel-obs-qa-3.16.7-24.1, kernel-obs-qa-xen-3.16.7-24.1, kernel-pae-3.16.7-24.1, kernel-source-3.16.7-24.1, kernel-syms-3.16.7-24.1, kernel-vanilla-3.16.7-24.1, kernel-xen-3.16.7-24.1, pcfclock-0.44-260.11.1, vhba-kmp-20140629-2.11.1, xen-4.4.2_06-25.1, xtables-addons-2.6-11.1