Bug 372676

Summary: Regression: Suspend to disk freezes on HP 6715b after kernel update
Product: [openSUSE] openSUSE 10.3 Reporter: Matthias Hopf <mhopf>
Component: KernelAssignee: Pavel Machek <pavel>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Final   
Target Milestone: ---   
Hardware: i586   
OS: openSUSE 10.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Matthias Hopf 2008-03-20 13:00:17 UTC
On this HP Compaq 6715b laptop "powersave -U" freezes the machine, even when booted with vga=normal. This happens after an online update yesterday. I haven't had an update for quite a while.
The last few lines on the display read:

Looking for splash system... Mar 20 12:31:54 linux-ivhr kernel: swsusp: Basic memory bitmaps created
splashy_start_splash: error -2none
s2disk: Snapshotting system
<blinking cursor>

After trying to switch the VT, the cursor stops blinking, but nothing else happens. SysRQ still works.


Kernel is 2.6.22.17-0.1, I also tried 2.6.22.5-31.0.

This used to work on this machine, at least with kernel 2.6.22.12-SL103_BRANCH_20071114160351 (which I cannot test anymore, because the update nuked it from the disk !@#$).


I also tried the kernel method with echo disk >/sys/power/state, same effect, there the last few lines state:

Stopping tasks ... done
Shrinking memory... done (0 pages freed)
Freed 0 kbyte in 0.02 seconds (0.00 MB/s)
Suspending console(s)
<blinking cursor>


Loaded modules are:
Module                  Size  Used by
apparmor               40736  0 
sha256                 15232  0 
aes_i586               37236  2 
cbc                     8448  1 
blkcipher              10116  1 cbc
usbhid                 41300  0 
hid                    29184  1 usbhid
ff_memless              9352  1 usbhid
uhci_hcd               27024  0 
dm_crypt               16904  1 
loop                   21636  0 
dm_mod                 56880  3 dm_crypt
pcmcia                 41076  0 
firmware_class         13568  1 pcmcia
yenta_socket           28684  1 
rsrc_nonstatic         15872  1 yenta_socket
ohci1394               36272  0 
pcmcia_core            40852  3 pcmcia,yenta_socket,rsrc_nonstatic
ieee1394               91136  1 ohci1394
battery                14724  0 
parport_pc             40764  0 
parport                37832  1 parport_pc
container               9088  0 
ac                      9604  0 
tg3                   104068  0 
button                 12432  0 
i2c_piix4              12556  0 
ati_agp                12684  0 
i2c_core               27520  1 i2c_piix4
k8temp                  9600  0 
hwmon                   7300  1 k8temp
rtc_cmos               12064  0 
rtc_core               23048  1 rtc_cmos
rtc_lib                 7040  1 rtc_core
agpgart                35764  1 ati_agp
snd_hda_intel         272796  0 
snd_pcm                82564  1 snd_hda_intel
snd_timer              26756  1 snd_pcm
snd                    58164  3 snd_hda_intel,snd_pcm,snd_timer
shpchp                 35092  0 
soundcore              11460  1 snd
serio_raw              10756  0 
pci_hotplug            33216  1 shpchp
joydev                 13632  0 
snd_page_alloc         13960  2 snd_hda_intel,snd_pcm
sr_mod                 19492  0 
cdrom                  37020  1 sr_mod
sg                     37036  0 
ehci_hcd               34956  0 
sd_mod                 31104  7 
ohci_hcd               23684  0 
usbcore               123372  5 usbhid,uhci_hcd,ehci_hcd,ohci_hcd
edd                    12996  0 
ext3                  131848  5 
mbcache                12292  1 ext3
jbd                    68148  1 ext3
fan                     9220  0 
pata_atiixp            12032  0 
ahci                   29188  6 
libata                136776  2 pata_atiixp,ahci
scsi_mod              140376  4 sr_mod,sg,sd_mod,libata
thermal                19848  0 
processor              40744  1 thermal


I'm pretty much lost what else to try.
Comment 1 Matthias Hopf 2008-03-20 13:29:56 UTC
The 'container' module wasn't unloadable in 2.6.22.5-31.0, so I renamed it. I unloaded as many modules  as possible, on both kernels, finally stripped down to:

Module                  Size  Used by
sd_mod                 31104  6 
usbcore               123372  1 
ext3                  131848  4 
mbcache                12292  1 ext3
jbd                    68148  1 ext3
ahci                   29188  5 
libata                136776  1 ahci
scsi_mod              140376  2 sd_mod,libata

Same behavior as before.
I have no freaking clue what else to do here.
Comment 2 Forgotten User ZhJd0F0L3x 2008-03-20 16:28:54 UTC
(In reply to comment #0 from Matthias Hopf)
> I also tried the kernel method with echo disk >/sys/power/state, same effect,
> there the last few lines state:
> 
> Stopping tasks ... done
> Shrinking memory... done (0 pages freed)
> Freed 0 kbyte in 0.02 seconds (0.00 MB/s)
> Suspending console(s)
> <blinking cursor>

...now _if_ we were able to dynamically disable the console suspend code, _then_ you would probably see where it hangs. People had already written code to do that, you know... ;-)
Comment 3 Pavel Machek 2008-03-20 21:58:43 UTC
try no_console_suspend on kernel commandline.

You should be able to get the old kernel somewhere, no? At the very least, you could pull kernel cvs, and do bisect there.

sysrq-p might be useful. As might be nosmp test.
Comment 4 Matthias Hopf 2008-03-25 11:08:33 UTC
As I wrote that sysrq works I forgot to mention that I don't see any output of it :-P

I'll try no_console_suspend.
Comment 5 Matthias Hopf 2008-03-25 16:02:03 UTC
0% change with no_console_suspend.

Building 20071109 (the closest kernel I could find to the last known working). First attempt failed due to not enough free space.
Comment 6 Matthias Hopf 2008-03-25 18:56:05 UTC
I'm out of ideas except one.

I tried the following kernels:

2.6.22.5-31, 2.6.22.12-0.1, 2.6.22.9-0.4, 2.6.22.17-0.1
All fail.
2.6.18.8-0.9 (from 10.2) doesn't boot any more.


I don't know how this could ever work, and I'm almost assuming that I've been dreaming. I also cannot bisect ATM, because I don't have a single working version. I'm thinking whether this could be something else than kernel related, but I've been trying to suspend in runlevel 1 with no graphics involved, and the suspend package didn't have an update.


The last thing I'll try is reinstalling 10.3 on a different partition. Then testing before and after software updates.
Comment 7 Matthias Hopf 2008-03-27 10:59:16 UTC
A freshly installed 10.3 shows the same symptoms.
But an updated 10.2 (kernel 2.6.18.8-0.9) suspends perfectly, both kernel space and user space level.

With 10.3 I also tried noapic nmi_watchdog=0. No effect.


I don't know whether I can bisect this (assuming that the effect on unpatched kernels is the same), as I cannot test the two kernels in the same installation system, and cross-platform testing might proof difficult.
But I'll try.


(In reply to comment #3 from Pavel Machek)
> You should be able to get the old kernel somewhere, no? At the very least, you
> could pull kernel cvs, and do bisect there.

I hope you mean git?
Comment 8 Pavel Machek 2008-03-28 09:29:12 UTC
Kernel should be the only place what matters w.r.t. suspend-to-disk... at least if you use "echo disk > /sys/power/state" method.

Useful options for 10.3 include "nohz=off highres=off"...

If you want to proceed with bisect, verify problem is present vanilla 2.6.22, and absent in vanilla 2.6.18...

(And you may to verify 2.6.25-rc7, too, perhaps it is fixed there? :-).
Comment 9 Matthias Hopf 2008-03-28 11:19:52 UTC
Upstream 2.6.22 works fine. At least kernel-based suspend.

Just verified with 2.6.22.17-0.1-vanilla. Both methods work fine. Stupid me, should have thought about that.
Is there a standard approach for bisecting kernel patches in our kernel packages?

I tried to build packages with only part of the patches applied, but that proved to be nontrivial. After some fiddeling with the specfile this works now.
Comment 10 Forgotten User ZhJd0F0L3x 2008-04-09 15:32:23 UTC
Matthias, i see similar things on my hp2510p (see bug 372676) with current STABLE / Factory x86_64: it works with kernel-vanilla, but not with kernel-default.
Comment 11 Matthias Hopf 2008-04-11 11:15:51 UTC
Our kernel from stable (2.6.25-rc8.git7.13) seems to work fine.

Pavel, if you don't think this will be fixed for 10.3, close this as WONTFIX.
Comment 12 Matthias Hopf 2008-04-11 13:01:26 UTC
Remove Needinfo
Comment 13 Pavel Machek 2008-04-17 13:38:27 UTC
It should work in opensuse11, and it would be quite a lot of work to find out why it broke in 10.3...
Comment 14 Matthias Hopf 2008-05-14 14:33:25 UTC
Agreed. As the new kernel works apparently w/o any regressions on 10.3, I'm happy as well :)