Bug 935086 - System hangs on resume from hibernation
Summary: System hangs on resume from hibernation
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: 13.2
Hardware: x86-64 openSUSE 13.2
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-17 12:59 UTC by Uwe Geuder
Modified: 2015-09-02 05:39 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
screen image when resume from hibernate hangs (693.65 KB, image/jpeg)
2015-06-17 12:59 UTC, Uwe Geuder
Details
dmidecode output (19.36 KB, text/plain)
2015-06-18 05:43 UTC, Uwe Geuder
Details
hwinfo output (54.06 KB, text/plain)
2015-06-18 05:45 UTC, Uwe Geuder
Details
screen picture when resume from hibernate hangs + SysRq (523.20 KB, image/jpeg)
2015-06-22 07:52 UTC, Uwe Geuder
Details
pm-suspend.log after re-installing suspend package (11.01 KB, text/plain)
2015-06-24 06:07 UTC, Uwe Geuder
Details
screen picture of ps output shortly before resume from hibernate hangs (250.39 KB, image/jpeg)
2015-06-25 07:55 UTC, Uwe Geuder
Details
modified resume script (additional debug output) (842 bytes, text/plain)
2015-06-26 16:48 UTC, Uwe Geuder
Details
screen picture when resume hangs with additional debug output (pic1) (237.05 KB, image/jpeg)
2015-06-26 16:56 UTC, Uwe Geuder
Details
screen picture when resume hangs with additional debug output (pic2) (196.12 KB, image/jpeg)
2015-06-26 17:08 UTC, Uwe Geuder
Details
screen picture of a successful resume (rd.break=pre-mount) during my last sleep (256.98 KB, image/jpeg)
2015-06-28 18:05 UTC, Uwe Geuder
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uwe Geuder 2015-06-17 12:59:34 UTC
Created attachment 638210 [details]
screen image when resume from hibernate hangs

During resuming from hibernation to disk my system hangs.
I just installed the new dracut (patch openSUSE-2015-427) but the problem persists.

I am aware that there are a couple of similar bug reports already, e.g. 
https://bugzilla.opensuse.org/show_bug.cgi?id=917221 but I'm not sure this is the same issue. My hang has been 100% repeatable since 13.2 came out. (Actually system has been reinstalled once since then, problem exactly the same)

When I started to debug the problem and added splash=0 rd.break=pre-mount
to the kernel command line the problem went away.

- With default kernel args: every resume hangs
- With additional splash=0: every resume hangs
- With additional splash=0 rd.break=pre-mount: every resume succeeds

So my guess is this could be a timing issue. Entering the rd shell and waiting until I press Crtl-D makes the system slow enough that it works.

The last console message before hanging is (see attached screen image)

   [ OK ] Reached target Remote File Systems

The problem must be somewhere after that. But when trying to debug where it is the problem went away.

splash=0 rd.break=pre-mount is kind of acceptable for me, but still that's not the way things should work.

My setup:

/boot is ext2
rest of the disk is LUKS encrypted, LVM, rootfs is btrfs, /home is xfs

Some other report mentions that encrypted disks don't work, soem problem with crypttab. However, this has never been my problem. The disk password is always asked an opening the encryption seems to succeed. The hand occurs somewhat later.

I guess the problem is difficult to understand/fix from the information provided. So all hints to debug it further / provide more info are welcome.
Comment 1 Uwe Geuder 2015-06-17 13:02:02 UTC
s/the hand/the hang/
Comment 2 Bernhard Wiedemann 2015-06-17 18:53:12 UTC
Could depend on actual hardware, so please attach output from
hwinfo --all
and
dmidecode

Do you use remote filesystems (such as NFS) on that machine?
Any other noteworthy things that are different than the default install?

Do you use Ethernet or WiFi?
NetworkManager or wicked?
Comment 3 Uwe Geuder 2015-06-18 05:43:57 UTC
Created attachment 638301 [details]
dmidecode output
Comment 4 Uwe Geuder 2015-06-18 05:45:00 UTC
Created attachment 638302 [details]
hwinfo output
Comment 5 Uwe Geuder 2015-06-18 05:46:01 UTC
Thanks for the quick reply!

- No remote filesystem whatsoever.
- Ethernet
- NetworkManager
- nothing non-default comes to my mind (the boot subvolume on the btrfs rootfs created by the installer has been deleted after installation. But I guess that's more like a bug/misfeature in the installer that it creates such subvolume when a separate /boot filesystem is in use)

Why did you you ask for network? It is my understanding (without deeper analysis) that the network comes up  first in the real root filesystem, but the hang occurs while still in initramfs. At least what I have observed. Of course if rootfs were on a a network filesystem the network would be required already in initramfs. So there some logic that brings the network up conditionally in initramfs? At least it should not be needed in my setup and the problem is the same whether the netwrok cable is attached or detached at the time of resuming.

dmidecode and hwinfo attached
Comment 6 Takashi Iwai 2015-06-18 10:19:23 UTC
Is this a hang in kernel level?  You can enable magic sysrq beforehand by changing to kernel.sysrq=1 in /etc/sysctl.conf (and applying it), and see whether you have a control via magic sysrq.
For example, alt-sysrq-s will trigger sync, alt-sysrq-t will show the stack traces of all running tasks.

Also, did you see whether you get any better information with rd.debug?
Comment 7 Takashi Iwai 2015-06-19 05:34:17 UTC
Also, how did you hibernate?  Could you try the direct hibernate via
  echo disk > /sys/power/state
?
Comment 8 Uwe Geuder 2015-06-22 07:52:56 UTC
Created attachment 638634 [details]
screen picture when resume from hibernate hangs + SysRq

I use systemctl hibernate to hibernate the system. (Originally I used KDE desktop to call hibernate from the GUI. The hang existed already there with the same symptoms. Recently I have switched to i3 window manager and I call systemctl hibernate from command line).

When the system hangs in resume SysRq just prints the headlines, but no information. (see attached screen shot) What does that mean?

While experimenting with SysRq I noticed that "SysRq i" (SIGKILL to all) makes the resume complete. (tried twice, worked twice) The system seemed functional after that, but I did not dare to really use it, because I'm not sure what might be in an inconsistent state after the killing. What can we learn from that? I guess it means that the hang was still in initramfs, so killing all initramfs processes made it resuming the real root. But I don't understand the details how the real root could come up when everything is killed. 

(After the resume had completed SysRq showed the complete information, not just the headlines.

I also tried "echo disk | sudo tee /sys/power/state". In this case the system does not hang when resuming. What do we learn from that?
Comment 9 Takashi Iwai 2015-06-22 08:18:03 UTC
(In reply to Uwe Geuder from comment #8)
> Created attachment 638634 [details]
> screen picture when resume from hibernate hangs + SysRq
> 
> I use systemctl hibernate to hibernate the system. (Originally I used KDE
> desktop to call hibernate from the GUI. The hang existed already there with
> the same symptoms. Recently I have switched to i3 window manager and I call
> systemctl hibernate from command line).
> 
> When the system hangs in resume SysRq just prints the headlines, but no
> information. (see attached screen shot) What does that mean?

Is quiet boot option removed?
Also, increase the log level via alt-sysrq-8 or 9 beforehand.
 
> While experimenting with SysRq I noticed that "SysRq i" (SIGKILL to all)
> makes the resume complete. (tried twice, worked twice) The system seemed
> functional after that, but I did not dare to really use it, because I'm not
> sure what might be in an inconsistent state after the killing. What can we
> learn from that? I guess it means that the hang was still in initramfs, so
> killing all initramfs processes made it resuming the real root. But I don't
> understand the details how the real root could come up when everything is
> killed. 

The kernel already started to the resume, as its prompt already shows.  But I wonder how the remote file system message appears *after* it.  So, this looks like that two things are running concurrently and conflicting.

> (After the resume had completed SysRq showed the complete information, not
> just the headlines.
> 
> I also tried "echo disk | sudo tee /sys/power/state". In this case the
> system does not hang when resuming. What do we learn from that?

What if you pass resumedelay=10 boot option?  This will delay the resume in 10 seconds after kicked off.
Comment 10 Uwe Geuder 2015-06-22 14:50:47 UTC
(In reply to Takashi Iwai from comment #9)

> 
> Is quiet boot option removed?

There is no quiet option on my kernel line. It was the first thing I removed
when starting this debugging exercise and I have not put it back since

> Also, increase the log level via alt-sysrq-8 or 9 beforehand.

The log level seems to have no effect on SysRq-L SysRq-P, and SysRq-T. when the system hangs no information is shown even if the log level is 8 or 9. When the 
system works, full information is show even if the log level is 0.

During the boot more messages are shown if log level is 9. But there is no additional message shortly before the system hangs at "Reached target Remote File system" as shown in the screen pictures before.

> 
> What if you pass resumedelay=10 boot option?  This will delay the resume in
> 10 seconds after kicked off.

I changed to resumedelay=30 in order to be sure absolutely I don't miss any effect of the option. This option does not seem to work as intended. The delay happens every time (also on fresh boots) even before the disk password is asked. Because the snapshot is inside the encrypted volume I don't think the system can already know at this point of time that it should resume.

After entering the disk password the parameter seems to have no effect any more in either of the 3 cases:

1.) systemctl hibernate: resume hangs forever
2.) echo disk > /sys/power/state: no additional wait of 30 seconds when the system is resuming
3.) Alt-SysRq-i when the system hangs: no additional wait when the system is resuming.
Comment 11 Takashi Iwai 2015-06-22 15:19:14 UTC
(In reply to Uwe Geuder from comment #10)
> (In reply to Takashi Iwai from comment #9)
> 
> > 
> > Is quiet boot option removed?
> 
> There is no quiet option on my kernel line. It was the first thing I removed
> when starting this debugging exercise and I have not put it back since
> 
> > Also, increase the log level via alt-sysrq-8 or 9 beforehand.
> 
> The log level seems to have no effect on SysRq-L SysRq-P, and SysRq-T. when
> the system hangs no information is shown even if the log level is 8 or 9.
> When the 
> system works, full information is show even if the log level is 0.
> 
> During the boot more messages are shown if log level is 9. But there is no
> additional message shortly before the system hangs at "Reached target Remote
> File system" as shown in the screen pictures before.
> 
> > 
> > What if you pass resumedelay=10 boot option?  This will delay the resume in
> > 10 seconds after kicked off.
> 
> I changed to resumedelay=30 in order to be sure absolutely I don't miss any
> effect of the option. This option does not seem to work as intended. The
> delay happens every time (also on fresh boots) even before the disk password
> is asked. Because the snapshot is inside the encrypted volume I don't think
> the system can already know at this point of time that it should resume.

Yeah, it's no help, unfortunately.  The real resume is triggered in dracut 95resume module, and this makes skipping the resumedelay option.
 
As a blind shot: could you try to uninstall the package "suspend"?
This is a user-space suspend and it often does thing badly with openSUSE 13.2 and later.
Comment 12 Uwe Geuder 2015-06-23 07:16:18 UTC
(In reply to Takashi Iwai from comment #11)

>  
> As a blind shot: could you try to uninstall the package "suspend"?
> This is a user-space suspend and it often does thing badly with openSUSE
> 13.2 and later.

Yes, after removing the "suspend" package the system resumes without hanging.
Thanks for that tip. (The bug report https://bugzilla.opensuse.org/show_bug.cgi?id=917221m suggests that uninstalling pm-utils would help, but that was not the case for me.)

s2disk progress is no longer displayed (well, it's part of the suspend package so that's not a surprise). Instead the same progress messages are shown as when writing directly into /sys/power/state.

After that I removed my debugging support and went back to the kernel command line containing "splash=silent quiet" as it was initially after installation. (instead of just "splash=0" used during the debugging.)

Resume still works.

There is only one cosmetic issue. Plymouth screen is not displayed when the system hibernates, instead console message are visible. I know it works in 13.1, I don't remember whether it has ever worked in 13.2, because I have used "splash=0" for too long.

Personally the console messages don't disturb me. But from a distro point are we happy with the solution of uninstalling suspend package and having no plymouth screen during hibernate? Could uninstalling suspend package break some other setups than mine? (As said I'm not 100% sure whether uninstalling suspend made plymouth during suspend go away or whether it's an unrelated issue. But I need to stop debugging for now and do some "real" work...)
Comment 13 Takashi Iwai 2015-06-23 07:31:40 UTC
OK, then this is some wrong dracut and suspend setup.  I suspect 95resume has an issue with suspend package.  Maybe this is a dup of bug 925873 and bug 905424.

Could you try one more test?  Please reinstall suspend and pm-utils but change SLEEP_MODULE to "kernel" in pm-utils default.  Does this make resume working again?
Comment 14 Uwe Geuder 2015-06-24 06:05:55 UTC
(In reply to Takashi Iwai from comment #13)

> Could you try one more test?

No problem, I'm glad to getting this solved.

> Please reinstall suspend and pm-utils but
> change SLEEP_MODULE to "kernel" in pm-utils default.  Does this make resume
> working again?

Just to clarify: Removing pm-utils was something I tried without success in a previous installation of 13.2. For the whole lifetime of this report pm-utils has always been installed.

I re-installed the suspend package and created a configuration file like this:

$ cat /etc/pm/config.d/sleepmodule.config 
SLEEP_MODULE="kernel"

Resume works without hanging. The suspend hooks are executed as shown in the attached log file. E.g. grubonce suppresses the usual grub menu when waking up
from hibernation. That suppression was not there while suspend package was uninstalled. 

However, I notice the following issues

1. I made 5 hibernate cycles. In one of the 5 the machine crashed and rebooted
during the resume. Obviously at the reboot no valid snapshot was found anymore, so a fresh boot occurred. No traces where left in the logs what has happened. So it probably happened shortly before, during, or short after switching to the real root file system and no information could be stored to the filesystem.
I don't have time now to make more reliable statistics how often that really happens. But while using the rd.break-premount work-around it has not happened during ~3 months, some 50-100 resumes.

2. There is no plymouth screen during hibernation. Instead there is flickering and console messages are visible. As said before, not a showstopper for me, but a regression from 13.1
Comment 15 Uwe Geuder 2015-06-24 06:07:30 UTC
Created attachment 638871 [details]
pm-suspend.log after re-installing suspend package
Comment 16 Uwe Geuder 2015-06-24 06:27:47 UTC
(In reply to Takashi Iwai from comment #13)
> OK, then this is some wrong dracut and suspend setup.  I suspect 95resume
> has an issue with suspend package.  Maybe this is a dup of bug 925873 and
> bug 905424.

I think both bugs are completely different from the symptoms observed. Also for me rd.break=pre-mount makes the resume complete, but the reporter of 925873 writes it still hangs (different location than mine).

But what all 3 bug reports have probably in common: There is some nasty competition between pm-utils, suspend, systemd, and kernel hibernate functionality. (I do not used suspend to RAM, but I guess it's the same there).
So as a distro it might be useful to remove some package(s) and make sure the remaining packages co-operate nicely.

The other reports had some comments what seems particularly old/unmaintained. I have no information to add at this moment.
Comment 17 Takashi Iwai 2015-06-24 07:06:41 UTC
(In reply to Uwe Geuder from comment #14)
> (In reply to Takashi Iwai from comment #13)
> 
> > Could you try one more test?
> 
> No problem, I'm glad to getting this solved.
> 
> > Please reinstall suspend and pm-utils but
> > change SLEEP_MODULE to "kernel" in pm-utils default.  Does this make resume
> > working again?
> 
> Just to clarify: Removing pm-utils was something I tried without success in
> a previous installation of 13.2. For the whole lifetime of this report
> pm-utils has always been installed.
> 
> I re-installed the suspend package and created a configuration file like
> this:
> 
> $ cat /etc/pm/config.d/sleepmodule.config 
> SLEEP_MODULE="kernel"
> 
> Resume works without hanging. The suspend hooks are executed as shown in the
> attached log file. E.g. grubonce suppresses the usual grub menu when waking
> up
> from hibernation. That suppression was not there while suspend package was
> uninstalled. 
> 
> However, I notice the following issues
> 
> 1. I made 5 hibernate cycles. In one of the 5 the machine crashed and
> rebooted
> during the resume. Obviously at the reboot no valid snapshot was found
> anymore, so a fresh boot occurred. No traces where left in the logs what has
> happened. So it probably happened shortly before, during, or short after
> switching to the real root file system and no information could be stored to
> the filesystem.
> I don't have time now to make more reliable statistics how often that really
> happens. But while using the rd.break-premount work-around it has not
> happened during ~3 months, some 50-100 resumes.

Hmm, I can't think of the relation with pm-utils immediately.  (Actually not figured out why user-suspend got broken but kernel-suspend works.)

All things look like a side effect of racy resume procedure to me, so I won't be surprised if some instability remains even without pm-utils.

> 2. There is no plymouth screen during hibernation. Instead there is
> flickering and console messages are visible. As said before, not a
> showstopper for me, but a regression from 13.1

This is a known drawback of kernel-suspend, IIRC.
Comment 18 Uwe Geuder 2015-06-25 07:53:41 UTC
(In reply to Takashi Iwai from comment #17)

> 
> All things look like a side effect of racy resume procedure to me, so I


Yes, same here.

I went back to the default configuration and added

ps -ef >/dev/console
sleep 15

at the beginning of /usr/lib/dracut/modules.d/95resume/resume.sh

(needs to be built with dracut --add debug, because by default the ps binary is not in initramfs)

I would have expected that the sleep 15 prevents the hang as does the rd.break=pre-mount

But it does not, resume hangs in the "old" location.  I attach the screen picture of ps output, but probably it's more entertaining than helpful to understand what is going on. Need to stop debugging now. Maybe I can add even more debugging in a few days.
Comment 19 Uwe Geuder 2015-06-25 07:55:06 UTC
Created attachment 639035 [details]
screen picture of ps output shortly before resume from hibernate hangs
Comment 20 Uwe Geuder 2015-06-26 16:46:42 UTC
I went back to default setup and modified /usr/lib/dracut/modules.d/95resume/resume.sh to 

1.) redirect all output to the console
2.) give an execution trace
3.) adding ample of sleep to give the use time to analyze the screen output.

Even with all the sleeps the resume hangs every time. So if it is a race condition the race happens before resume.sh is entered.

When I add rd.break=pre-mount to the kernel command line and in the emergency shell do nothing but press ctrl-D resumes do succeed every time even with the modified script.

I'll attach my modified script and 2 screen pictures.
Comment 21 Uwe Geuder 2015-06-26 16:48:58 UTC
Created attachment 639391 [details]
modified resume script (additional debug output)
Comment 22 Uwe Geuder 2015-06-26 16:56:24 UTC
Created attachment 639392 [details]
screen picture when resume hangs with additional debug output (pic1)

/usr/sbin/resume starts to execute correctly the progress counter shows how the image is unpacked. Then the system hangs. Unfortunately I don't know what /usr/sbin/resume looks like in a successful case, because in successful cases the console contents swtiches very fast after the unpacking. So I am not sure, whether the hang happens still before it switches to the restored kernel structures or after the switch.

However, as shown in earlier screen dumps SysRq is in this semi-broken state that it shows only headings but no contents. So I might guess that the switch has happened but the restored kernel is in a semi-dead state.
Comment 23 Uwe Geuder 2015-06-26 17:08:37 UTC
Created attachment 639393 [details]
screen picture when resume hangs with additional debug output (pic2)

This is exactly the same scenario as before. 

But the output of /usr/sbin/resume looks a bit weird to me. After the compression information there is another line "wrote 501 MB" before it hangs. What does that mean?

It would be useful to see the output of a successful /usr/sbin/resume operation to be sure whether that is normal or an indication that things are going wrong. I could guess this is not the first time such a problem needs to be debugged, so I would nearly expect that some option exists in either /usr/sbin/resume or the kernel or both to add a delay right there. Like the resumedelay mentioned earlier, but for unknown reasons that produced a delay in a different location.
If nobody know what such option could be (or some other debugging hints), let's see whether I can find time to study the source a bit.
Comment 24 Takashi Iwai 2015-06-26 17:40:57 UTC
Hm, then uswsuspend might be also broken with the recent kernel.  This method isn't used by many distros, so little tested, I'm afraid.

As of now, the current solution seems to go to the direction to change the default sleep method.  Many things have been discussed in bug 925873.  So, further debugging of /usr/sbin/resume might not help much...

Maybe it's better to concentrate on stabilization of kernel hibernation.  That is, check more details about the S4 resume failure (not hang) with DEFAULT_SLEEP=kernel.  For example, whether this happens really only with pm-utils or not, etc.
Comment 25 Takashi Iwai 2015-06-27 08:58:48 UTC
BTW, while looking at the dmesg output on my machine, I noticed that the system triggers hibernate-resume twice: once the kernel itself and once by dracut.

Could you add "noresume" boot option (while keeping "resume=xxx" option) and retest for a few times whether you still get the unexpected reboot with SLEEP_METHOD="kernel"?

Also does this have any influence with SLEEP_METHOD="uswsusp"?
Comment 26 Takashi Iwai 2015-06-27 09:00:08 UTC
(In reply to Takashi Iwai from comment #25)
> Could you add "noresume" boot option (while keeping "resume=xxx" option) and
> retest for a few times whether you still get the unexpected reboot with
> SLEEP_METHOD="kernel"?

I meant SLEEP_MODULE, of course.
Comment 27 Uwe Geuder 2015-06-28 17:22:29 UTC
(In reply to Uwe Geuder from comment #10)

Let me first correct one of my previous observations.

> 
> > Also, increase the log level via alt-sysrq-8 or 9 beforehand.
> 
> The log level seems to have no effect on SysRq-L SysRq-P, and SysRq-T. when
> the system hangs no information is shown even if the log level is 8 or 9.
> When the 
> system works, full information is show even if the log level is 0.
> 
> During the boot more messages are shown if log level is 9. But there is no
> additional message shortly before the system hangs at "Reached target Remote
> File system" as shown in the screen pictures before.
> 

This was incorrect. On this keyboard I need to use the keypad digits to make it work, Alt-SysRq-KP_8 instead of Alt-SysRq-8. which I used before.

So the corrected information is: the kernel seems to be fully alive every time when the system hangs. Obviously the log level was too low before.

Unfortunately task list information does by far not fit into the console scrollback buffer. So I we need that I have find a way to increase the scrollback buffer first.

Form the CPU state I can see that always 3 of my cores are in idle and the 4th core is handing the SysRq. So this might look like a deadlock in user space.
Comment 28 Uwe Geuder 2015-06-28 17:32:27 UTC
I see that you have added several comments, please give me some time to make the related investigations.

In the meantime I have added some printf debugging and sleep() calls to the resume program in order to see what it looks like in the successful case (always achievable with rd.break=pre-mount)

My modified code can be found from OBS at https://build.opensuse.org/package/view_file/home:geuder:branches:openSUSE:13.2:Update/suspend/resume-printf-debugging.diff?expand=1

I will attach a screen picture taken during the last sleep of a successful resume.
Comment 29 Uwe Geuder 2015-06-28 18:05:25 UTC
Created attachment 639442 [details]
screen picture of a successful resume (rd.break=pre-mount) during my last sleep

From the picture we see that after reporting the compresssion ratio there will
be 2 lines of size information of the compressed image. 

When the hang occurs only 0 or 1 line of compressed image statistics is written and then the system hangs (as shown in previous attachments 639392 and 639393) If you look at code starting from line 669 in load.c you see that it does nothing besides printf() http://paste.opensuse.org/45471188
That really would look like printf() is hanging, but I cannot believe that. On one side it could vaguely explain why rd.break=pre-mount makes a difference, the console can be in different state after having been in an interactive shell just before. But in the normal setup stdout of the resume process is not the console at all, it's the socket to journald. And the hang has been reproducible with the socket and with the console. So no, hanging in printf() makes no sense to me. 

My printf debugging occurs even later, so we are not even close to the place where the kernel switches to the structures restored from snapshot. The hang seems to occur while the resume process runs in user space or does trivial printf() at most. Of course getting a call stack of the resume process might be helpful.
Comment 30 Uwe Geuder 2015-06-28 18:13:08 UTC
(In reply to Takashi Iwai from comment #24)
> Hm, then uswsuspend might be also broken with the recent kernel.  This
> method isn't used by many distros, so little tested, I'm afraid.
> 
...
> 
> Maybe it's better to concentrate on stabilization of kernel hibernation. 

According to the kernel documentation kernel resume is not support from an LVM2 partition at all. So it should not work for me at all. I find this a bit hard to believe, maybe the code has been improved but the the documentation has not been updated.

I have asked the suspend maintainers, let's see whether we get an answer.

http://marc.info/?l=linux-pm&m=143544300618472&w=2
Comment 31 Uwe Geuder 2015-06-28 18:16:31 UTC
(In reply to Takashi Iwai from comment #25)
> BTW, while looking at the dmesg output on my machine, I noticed that the
> system triggers hibernate-resume twice: once the kernel itself and once by
> dracut.
> 

Thanks for yet another idea. I need to investigate that later. The needinfo flag is still active.
Comment 32 Uwe Geuder 2015-06-29 10:54:21 UTC
(In reply to Takashi Iwai from comment #25)
> BTW, while looking at the dmesg output on my machine, I noticed that the
> system triggers hibernate-resume twice: once the kernel itself and once by
> dracut.
> 
> Could you add "noresume" boot option (while keeping "resume=xxx" option) and
> retest for a few times whether you still get the unexpected reboot with
> SLEEP_METHOD="kernel"?
> 
> Also does this have any influence with SLEEP_METHOD="uswsusp"?

Ah I did not even know that the kernel can obviously resume directly from
its command line without any help from initramfs. Well, I have used disk encryption longer than hibernate, so this does not apply to me.

Yes, the first resume failure has always been in the kernel log. I never understood where it comes from. That's also the place where resumedelay=10
takes effect. But of course with LUKS encryption and LVM that resume has never
succeeded for me and never will.

I don't think the failed attempt should confuse the kernel, the device just does not exist. It got confused there and misfunction latet, that would be a bad bug.

Anyway I tried the "nosuspend" parameter (only with uswsusp so far). It looks like dracut also respects this parameter and skips the whole resume script.

So not at good idea, at least not with uswsusp. It will boot with filesystem mounts after hibernate, which can always mean data loss.

Not sure about kernel mode. One resume action is in the same 95suspend script. If that is the only one, it will not resume either. But need to test.
Comment 33 Uwe Geuder 2015-07-06 18:04:55 UTC
(In reply to Takashi Iwai from comment #26)
> (In reply to Takashi Iwai from comment #25)
> > Could you add "noresume" boot option (while keeping "resume=xxx" option) and
> > retest for a few times whether you still get the unexpected reboot with
> > SLEEP_METHOD="kernel"?
> 
> I meant SLEEP_MODULE, of course.

If noresume is on the kernel cmd line and SLEEP_MODULE="kernel" the system does not even hibernate (systemctl hibernate does nothing visible).

At least /usr/lib/pm-utils/sleep.d/99Zgrub seems to check for the noresume option, I have not studied it details what happens if there is a match.
Comment 34 Takashi Iwai 2015-07-06 18:35:06 UTC
(In reply to Uwe Geuder from comment #33)
> (In reply to Takashi Iwai from comment #26)
> > (In reply to Takashi Iwai from comment #25)
> > > Could you add "noresume" boot option (while keeping "resume=xxx" option) and
> > > retest for a few times whether you still get the unexpected reboot with
> > > SLEEP_METHOD="kernel"?
> > 
> > I meant SLEEP_MODULE, of course.
> 
> If noresume is on the kernel cmd line and SLEEP_MODULE="kernel" the system
> does not even hibernate (systemctl hibernate does nothing visible).
> 
> At least /usr/lib/pm-utils/sleep.d/99Zgrub seems to check for the noresume
> option, I have not studied it details what happens if there is a match.

Thanks, but I don't think it worth to test further in this way.
We're going to remove suspend and pm-utils packages as an update fix even for openSUSE 13.2 in the end.  So, please test again without suspend and pm-utils packages, and confirm that it works stably enough.
Comment 35 Uwe Geuder 2015-07-07 10:17:31 UTC
(In reply to Takashi Iwai from comment #34)

> So, please test again without suspend and
> pm-utils packages, and confirm that it works stably enough.

OK, packages removed and kernel command line restored. One test was successful, but I will test a couple of days in normal usage (and hopefully be able to add a copule of extra hibernations) and report then how it went.
Comment 36 Uwe Geuder 2015-07-14 07:10:20 UTC
I have used it without suspend and pm-utils for a week now. I did not have much time for extra testing, so it basically was one hibernate in the evening and one resume next morning.

It worked fine until this morning when the machine came up but Ethernet did no longer work. I could not find out what was wrong so I rebooted and that solved the problem. This just as anecdotal evidence, as long as I cannot reproduce and debug it there is nothing we can do about that.

I'll be on holidays soon, so no more testing from my side for 4 weeks.
Comment 37 Uwe Geuder 2015-09-02 05:37:55 UTC
Without suspend and pm-utils: No crashes during 3 weeks of daily usage.
Comment 38 Takashi Iwai 2015-09-02 05:39:54 UTC
Thanks for updates.  Then let's close this as fixed.
Feel free to reopen if you encounter the same problem again.