Bug 231088

Summary: Resume after suspend to disk fails
Product: [openSUSE] openSUSE 10.2 Reporter: Robin Knapp <robin.knapp>
Component: OtherAssignee: Tejun Heo <teheo>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: gp, hare
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: output of hwinfo
pm-suspend.log
Messages during suspend
Mesages after suspend
debug patch
dmesg output with debug patch
debug patch #2
PM debug patch #3
dmesg with applied patch pm debug #3
libata-fix-port-EH-action-in-dev-action-mask

Description Robin Knapp 2006-12-29 15:52:14 UTC
After calling suspend to disk from kpowersave, resume does not work.

Powering off after suspend produces a strange loud "clack" sound.
(notebook with sata disks)

----------- from /var/log/boot.msg: ------------------------------------------
<5>Kernel command line: root=/dev/sda7 vga=0x31a resume=/dev/sda6 splash=verbose showopts
...
<4>Attempting manual resume
...
Trying manual resume from /dev/sda6
Creating device nodes with udev
Loading ide-core
Loading ide-disk
Loading scsi_mod
Loading sd_mod
Loading processor
Loading thermal
Loading piix
Loading libata
Loading ahci
Loading ata_piix
Loading fan
Loading edd
Loading jbd
Loading mbcache
Loading ext3
Invoking userspace resume from /dev/sda6
resume: Could not stat configuration file
resume: libgcrypt version: 1.2.3
resume: Could not read the image
Invoking in-kernel resume from /dev/sda6
Waiting for device /dev/sda7 to appear:  ok
...

----------------------------------------------------------------

-------- output of swapon -s: ----------------------------------
Filename                                Type            Size    Used    Priority
/dev/sda6                               partition       2104444 0       -1
----------------------------------------------------------------

-> so this device is the correct one.
Comment 1 Robin Knapp 2006-12-29 15:53:40 UTC
Created attachment 111197 [details]
output of hwinfo
Comment 2 Robin Knapp 2006-12-29 15:54:19 UTC
Created attachment 111198 [details]
pm-suspend.log
Comment 3 Robin Knapp 2006-12-30 19:41:31 UTC
Related to Bug 229210?
Maybe the disk cache and so the suspend image is not being written do the disk correctly.

However, my SATA disks shut down (sound) normally when doing a normal shutdown/poweroff.
Comment 4 Pavel Machek 2007-01-12 23:21:11 UTC
resume: Could not stat configuration file
resume: libgcrypt version: 1.2.3
resume: Could not read the image
Invoking in-kernel resume from /dev/sda6

...uswsusp can't find its config file, no wonder it breaks. Stefan?

In the meantime, you can try suspending with echo disk > /sys/power/state. (Does that work?)
Comment 5 Forgotten User ZhJd0F0L3x 2007-01-13 09:52:34 UTC
(In reply to comment #4)
> resume: Could not stat configuration file
> resume: libgcrypt version: 1.2.3
> resume: Could not read the image
> Invoking in-kernel resume from /dev/sda6
> 
> ...uswsusp can't find its config file, no wonder it breaks. Stefan?

No, this is a non-fatal error, we give all the parameters on the command line, no config file is needed.

This sounds more like the "disks are not correctly flushed before power-off, so image cannot be found" bug that Timo also hits on the T60.
 
> In the meantime, you can try suspending with echo disk > /sys/power/state.
> (Does that work?)

See "HIBERNATE_METHOD" setting on http://en.opensuse.org/Pm-utils

Timo, can you provide the bug number of the T60 bug? Thanks :-)
Comment 6 Timo Hoenig 2007-01-13 13:17:45 UTC
Seife:  Bug #223742.
Comment 7 Robin Knapp 2007-01-15 10:17:02 UTC
I tried that "echo disk..." method which fails somehow (tested some time ago, don't remember exactly).

After that, while trying to access my disk I got similar io buffer errors from the screenshot in Bug #223742, Comment 24

I can do some additional tests if you need more information.
Comment 8 Robin Knapp 2007-01-15 18:23:28 UTC
Created attachment 113021 [details]
Messages during suspend

Messages that appear after the command
echo disk >/sys/power/state
Comment 9 Robin Knapp 2007-01-15 18:24:48 UTC
Created attachment 113022 [details]
Mesages after suspend

Messages after trying to suspend

Access to disk does not work anymore.
Comment 10 Forgotten User ZhJd0F0L3x 2007-01-15 19:32:41 UTC
I'd guess this is a bug for Tejun and / or Hannes.
It is clearly not a suspend core bug but a SATA driver bug.
Comment 11 Tejun Heo 2007-01-16 11:16:40 UTC
Created attachment 113081 [details]
debug patch

Please apply the patch and report what the kernel says during and after suspend/resume.  I know it can be difficult with suspend/resume but please try to report all the messages.  Thanks.
Comment 12 Robin Knapp 2007-01-16 12:39:32 UTC
Created attachment 113103 [details]
dmesg output with debug patch

I have no camera at the moment but managed to write dmesg output to an nfs share (suspend starts at line 492 ["Freezing cpus ..."])

I don't see much more output from libata compared to a non-patched kernel.
Comment 13 Robin Knapp 2007-01-16 12:42:36 UTC
Forgot to mention:

Using "echo disk >/sys/power/state" does not turn the power off as pm-hibernate does but returns immediately to console
Comment 14 Tejun Heo 2007-01-16 13:59:46 UTC
Created attachment 113123 [details]
debug patch #2

Please apply the attached patch on top of the last debug patch and report the result.

I've just tested suspend to disk and resume both on the latest libata-dev devel kernel and opensuse 10.2 kernel.  Both worked perfectly.

The patch contains a small suspend sequence change and also disables power off after suspend to disk success.  It will ask you to power it down to give you time to take picture of the screen or whatever.  Thanks.
Comment 15 Robin Knapp 2007-01-16 15:14:46 UTC
Patch #2 does not show any different results.

> and also disables power off after suspend to disk success

I think there is no "suspend to disk success". It fails somewhere when using kernel method and returns immediately.

When using userspace method, there is no difference with this patch, too. It tells me that it writes the image and powers off. Resume fails.

This userspace suspend process can be aborted by pressing Backspace, though there are many sda error messages on console (alt-f10) even if suspend is aborted. (similar to that T60 bug)

So this error is triggered quite early in both kernel AND userspace suspend before issuing any power-off or spin-down commands.

(I have the latest online updates installed which make kernel suspend method possible)
Comment 16 Tejun Heo 2007-01-17 06:27:19 UTC
Created attachment 113274 [details]
PM debug patch #3

Please apply the attached patch on top of clean 10.2 kernel.  Even if everything else succeeds, suspend itself will be failed after 30sec delay.  This is intentional.

After suspend failed, please report the result of 'dmesg'.  Thanks.
Comment 17 Robin Knapp 2007-01-17 12:28:22 UTC
Created attachment 113341 [details]
dmesg with applied patch pm debug #3

echo disk >/sys/power/state with applied pm debug path #3

starts at line 472
Comment 18 Tejun Heo 2007-01-17 12:57:28 UTC
Thanks.  There seems to be a race condition around ATA_DFLAG_SUSPENDED that can be triggered when the timing is right.  Dunno why the condition never triggered on my or other machines till now.  I'll investigate further.  Please standby.
Comment 19 Tejun Heo 2007-01-17 16:50:19 UTC
Created attachment 113426 [details]
libata-fix-port-EH-action-in-dev-action-mask

Please apply the attached patch on a clean kernel and test.  If it fails, please apply it over debug patch #3 and post the result.  Thanks.
Comment 20 Robin Knapp 2007-01-17 20:40:40 UTC
yesss!
That's it!

Suspend to disk now work like a charm, both kernel and userspace method.

Thanks!
Comment 21 Tejun Heo 2007-01-18 01:13:07 UTC
Okay, great.  I'll commit the patch.  Thanks.