Bug 917221

Summary: Unable to resume after long hibernation
Product: [openSUSE] openSUSE Distribution Reporter: Leys <wguy4biz>
Component: BasesystemAssignee: Kristyna Streitova <kstreitova>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: chcao, sbrabec, tiwai, wguy4biz
Version: 13.2Flags: sbrabec: needinfo? (wguy4biz)
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.2   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 925873    
Bug Blocks:    
Attachments: pm-suspend.log, sections of journal and Xorg.0.log for failed hibernation/thaw
Output of hwinfo
Screenshot at time of resume failure
Various files and output related to resume device

Description Leys 2015-02-11 00:24:02 UTC
Created attachment 622691 [details]
pm-suspend.log, sections of journal and Xorg.0.log for failed hibernation/thaw

When i suspend to disk and resume, resume sometimes works and sometimes does not.  The determining factor appears to be the amount of time that is between the completion of the suspend and when it is resumed.  If i try to resume immediately after suspend, the resume is always successful; if the resume is a couple of hours after the suspend completes (not clear exactly how long is required), the resume fails.

Prior to upgrading to SuSE 13.2, i was able to successfully hibernate/resume on this machine many times after long suspensions, even suspensions of days.  I am less certain of how well this worked on 13.1, because i did not use the machine much when it had SuSE 13.1 on it, but it worked extremely well for many months with SuSE 12.3.  The differences between 13.2 and previous releases is that i put in 2 new hard drives and did a fresh install onto a RAID1 configuration using BTRFS, whereas previously it had been an ext4 boot drive, no RAID.

I am using GRUB2 with MBR for the boot loader.

The pm-suspend log shows the final stages of hibernation being executed, but there does not appear to be any thaw steps executed.  When the thaw is performed, i see some variation of video output; sometimes i see a very faint image of the background shown when booting from shutdown, sometimes that same image flashes briefly at full strength and others there is no apprent video, but there is a signal (my monitor detects no signal and reports this).  The machine does not show up on my router's network map, so the network does not appear to be active after the thaw.

I have attached a file with segments, or entire file contents of the following:
- pm-suspend.log for failed hibernate/thaw (entire file)
- journal entries for failed hibernate/thaw
- journal entries for successful hibernate/thaw
- Xorg.0.log showing both successful and unsuccessful hibernate/thaw (entire file)
Comment 1 Takashi Iwai 2015-02-13 14:25:38 UTC
My wild guess is the nouveau driver breakage.  We've got many bug reports regarding S3/S4 with nouveau, unfortunately.

Could you try the newer kernels in OBS Kernel:stable repo?
Comment 2 Leys 2015-02-13 20:00:52 UTC
(In reply to Takashi Iwai from comment #1)
> My wild guess is the nouveau driver breakage.  We've got many bug reports
> regarding S3/S4 with nouveau, unfortunately.
> 
> Could you try the newer kernels in OBS Kernel:stable repo?

I will try a newer kernel, but i wondier if it is the nouveau driver. I would think that during the resume phase, journal writing would be done via synchronous I/O for diagnostic purposes, but i see no entries in the journal for the stages of the resume before the video driver is activated.   In particular, i see none of the journal entries generated during the final stages of the hibernate, which usually are written immediately after the kernel becomes active.

If i enable a crash dump, would the enablement survive a resume and create a crash dump if a crash occurred doring resume?  Is there a way to force the system buffers to be flushed immediately after a write?  There used to be a way to set this, but the kernel has changed much since i knew how to do this.

Additional information.  It appears as if the video is always active on failed resumes, but very faint; to see it, i often have to view the screen at an angle.  The image on the screen is the splash screen shown during a fresh boot, suggesting that the kernel tried to do a fresh boot instead of a resume.

I have also now had failures for short periods of time between hibernate and thaw, as well as a successful thaw for a period of several hours after a hibernation.  But the general rule still is that long periods are exceedingly likely to cause failure and short period exceedingly likely to be successful.
Comment 3 Leys 2015-02-14 02:29:09 UTC
(In reply to Takashi Iwai from comment #1)
> My wild guess is the nouveau driver breakage.  We've got many bug reports
> regarding S3/S4 with nouveau, unfortunately.
> 
> Could you try the newer kernels in OBS Kernel:stable repo?

I installed the latest OBS stable kernel, but with no success.
# uname -r
3.19.0-1.g8a7d5f9-desktop

The symptoms are exactly as before.  I will keep this kernel a bit longer to see if the problem happens less frequently with this kernel, but my guess is that it makes no difference.  I only installed the minimal set of packages required to install the new kernel, so did not update the libdrm_nouveau package.
Comment 4 Takashi Iwai 2015-02-14 07:54:40 UTC
Then for identifying whether it's a nouveau issue, just boot with nomodeset, and do S4 and S3.  The graphics won't work, so you can do with vga=none, and check the remote login whether suspend/resume works.  If the same problem shows, the problem is in another place.

In anyway, please give hwinfo output.
Comment 5 Leys 2015-02-16 23:50:20 UTC
Created attachment 623465 [details]
Output of hwinfo
Comment 6 Leys 2015-02-16 23:50:53 UTC
(In reply to Takashi Iwai from comment #4)
> Then for identifying whether it's a nouveau issue, just boot with nomodeset,
> and do S4 and S3.  The graphics won't work, so you can do with vga=none, and
> check the remote login whether suspend/resume works.  If the same problem
> shows, the problem is in another place.
> 
> In anyway, please give hwinfo output.

I tried with vga=none and nomodeset in the default grub2 entry.  The result was a video mode that was not correct (my monitor is 1920x1080 but the video mode was 1280x1024)  But there was video, so it may be that i did not achieve what you had wanted, since the nouveau driver was still installed and active.  The resume still failed and now it is looking like more of a random failure, i.e. unrelated to time since hibernation.  Initially, it looked like this resolved the problem, but after a few tries, the problem reoccurred.

With the vga=none, however, there was no splash screen and what was displayed during resume that failed was the same as the first few lines of the boot log:
       Started Show Plymouth Boot Screen.
[ OK ] Reached target Paths.
[ OK ] Reached target Basic System.
[ OK ] Found device ST2000DM001-1ER1.
[ OK ] Found device ST2000DM001-1ER1.
[ OK ] Found device /dev/disk/by-uuid/1ee0c6d1-4bc1-41ff-9827-951d337c181c.
[ OK ] Started dracut initqueue hook.
       Starting dracut pre-mount hook...
[ OK ] Reached target Remote File Systems (Pre).
[ OK ] Reached target Remote File Systems.

It was not possible to log in remotely, nor even ping the machine.

In looking around, it seems that there have been problems with having grub2 on BTRFS because of the file structure of BTRFS, see https://bugzilla.opensuse.org/show_bug.cgi?id=856391, and i wonder if this could be related to my problem.
Comment 7 Takashi Iwai 2015-02-17 06:34:34 UTC
The vga=none would work only for the legacy boot.  Was it the case?

In anyway, try dracut with rd.debug and other options suggested in man page.  For disablig plymouth, pass plymouth.enable=0 option.
Comment 8 Leys 2015-02-17 18:13:09 UTC
(In reply to Takashi Iwai from comment #7)
> The vga=none would work only for the legacy boot.  Was it the case?
> 
> In anyway, try dracut with rd.debug and other options suggested in man page.
> For disablig plymouth, pass plymouth.enable=0 option.

I am not certain what you mean by 'legacy boot'.  As i indicated in the initial comment supplied when opening the bug, i am using GRUB2.  What is required to do a 'legacy boot'?  As i am using SuSE 13.2, i generally make changes to the boot parameters via the YAST2 'Boot Loader' option.  If something else is required, it would be helpful if you would specify what is required in some detail.

I will add rd.debug to a new file /etc/dracut.conf.d/03-debug.conf.  If there are any other options that would be helpful to you in resolving this, it would be good if you could specify; i am unfamiliar with dracut so am not certain what options would be helpful in problem resolution.
Comment 9 Takashi Iwai 2015-02-17 21:10:41 UTC
(In reply to John Leys from comment #8)
> (In reply to Takashi Iwai from comment #7)
> > The vga=none would work only for the legacy boot.  Was it the case?
> > 
> > In anyway, try dracut with rd.debug and other options suggested in man page.
> > For disablig plymouth, pass plymouth.enable=0 option.
> 
> I am not certain what you mean by 'legacy boot'.  As i indicated in the
> initial comment supplied when opening the bug, i am using GRUB2.  What is
> required to do a 'legacy boot'?  As i am using SuSE 13.2, i generally make
> changes to the boot parameters via the YAST2 'Boot Loader' option.  If
> something else is required, it would be helpful if you would specify what is
> required in some detail.

I meant "legacy BIOS boot mode" in comparison with "UEFI boot mode".  In the latter case, VGA isn't set up but EFI framebuffer is used.

> I will add rd.debug to a new file /etc/dracut.conf.d/03-debug.conf.  If
> there are any other options that would be helpful to you in resolving this,
> it would be good if you could specify; i am unfamiliar with dracut so am not
> certain what options would be helpful in problem resolution.

rd.debug would be good as a first step.  There are a few more options mentioned in dracut man page, too.
Comment 10 Leys 2015-02-18 01:46:29 UTC
(In reply to Takashi Iwai from comment #9)
> (In reply to John Leys from comment #8)
> > (In reply to Takashi Iwai from comment #7)
> > > The vga=none would work only for the legacy boot.  Was it the case?
> > > 
> > > In anyway, try dracut with rd.debug and other options suggested in man page.
> > > For disablig plymouth, pass plymouth.enable=0 option.
> > 
> > I am not certain what you mean by 'legacy boot'.  As i indicated in the
> > initial comment supplied when opening the bug, i am using GRUB2.  What is
> > required to do a 'legacy boot'?  As i am using SuSE 13.2, i generally make
> > changes to the boot parameters via the YAST2 'Boot Loader' option.  If
> > something else is required, it would be helpful if you would specify what is
> > required in some detail.
> 
> I meant "legacy BIOS boot mode" in comparison with "UEFI boot mode".  In the
> latter case, VGA isn't set up but EFI framebuffer is used.

I am using GRUB2, not GRUB2-EFI.
> 
> > I will add rd.debug to a new file /etc/dracut.conf.d/03-debug.conf.  If
> > there are any other options that would be helpful to you in resolving this,
> > it would be good if you could specify; i am unfamiliar with dracut so am not
> > certain what options would be helpful in problem resolution.
> 
> rd.debug would be good as a first step.  There are a few more options
> mentioned in dracut man page, too.

I was unable to get either rd.debug or plymouth.enable=0 to work; instead i removed the 'splash' and 'quiet' parameters altogether, which avoided the splash screen as well as gave some additional messages, and added kernel parameters 'debug' and 'ignore_loglevel' which provided a lot of debug messages.  I am attaching a couple of files.  One attachment is a screenshot taken at the time of resume failure showing that PM was not able to locate the snapshot and failed as a result; teh other is a collection of various pieces of data:
- fstab, showing the swap files
- output of blkid for both of the swap files
- grub.cfg showing that the resume parameter had the correct uuid

It appears that PM was able to find the correct partition, but was not able to find the snapshot.

If you feel that additional dracut parameters woudl be useful in resolving this, it would be helpful for you to provide the paramteres that you would find useful as well as a methodology for getting them to dracut, as nothing that i have tried seems to have worked, though i have tried various things as described in man pages.
Comment 11 Leys 2015-02-18 01:49:56 UTC
Created attachment 623627 [details]
Screenshot at time of resume failure
Comment 12 Leys 2015-02-18 01:50:56 UTC
Created attachment 623628 [details]
Various files and output related to resume device
Comment 13 Leys 2015-02-19 01:16:41 UTC
The problem happens pretty much universally now when using pm-hibernate; i have not had a successful resume in some time using pm-hibernate.  The pm-suspend works fine.

In doing some research, i discovered that a way to test is to write directly to files in the /sys/power directory.  When i do this, the hibernate works every time:
1. echo "platform" > /sys/power/disk
2. echo "disk" /sys/power/state

The hibernate/thaw works correctly and there does not seem to be any problem finding the snapshotted image on the disk used for resume. The number of the disk used for snapshotting and restoring is the one stored in /sys/power/resume file, and is 8:5, reflecting the number for the resume disk, /dev/sda5:
# ll /dev/sda5
brw-rw---- 1 root disk 8, 5 Feb 18 16:19 /dev/sda5

It is not clear whether s2disk is the problem, but i tried both 'uswsusp' and 'kernel' methods for hibernate, changing SLEEP_MODULE in /usr/lib/pm-utils/defaults, but neither seemed to work; both end up not being able to find the hibernation image.

While it is difficult to say with certainty because the log messages roll by so quickly during the boot/resume processing, it appeared to me as if the messages in the failed resume case looked more like the boot messages, with some resume messages thrown in, than the successful resume messages.  In the case of the successful resume, i did not see any messages like the following, though these do show up during boot processing:
Feb 18 16:19:38 linux-ky6z kernel: [TTM] Initializing pool allocator
Feb 18 16:19:38 linux-ky6z kernel: [TTM] Initializing DMA pool allocator
Feb 18 16:19:38 linux-ky6z kernel: nouveau  [     DRM] VRAM: 1024 MiB
Feb 18 16:19:38 linux-ky6z kernel: nouveau  [     DRM] GART: 1048576 MiB

It may be that the reason why i did not see these in the successful resume is that these msgs flew by during the resume, but there seems to be a console clear just before these msgs are printed to the console and there is also a console clear just before the successful resume gets to the point where it is restoring the image; the msgs prior to the clear seem to be the same in both cases, though it is exceedingly difficult to tell for certain.  I strongly suspect that these are not present during the successful resume, however, because i think that they would show up in the journal once it comes online.  The following lines are in the journal of the successful resume:
Feb 18 16:03:28 linux-ky6z kernel: ACPI: Waking up from system sleep state S4
Feb 18 16:03:28 linux-ky6z kernel: PM: noirq restore of devices complete after 11.515 msecs
Feb 18 16:03:28 linux-ky6z kernel: PM: early restore of devices complete after 0.767 msecs
Feb 18 16:03:28 linux-ky6z kernel: nouveau  [     DRM] re-enabling device...
Feb 18 16:03:28 linux-ky6z kernel: nouveau  [     DRM] resuming kernel object tree...
...
Feb 18 16:03:28 linux-ky6z kernel: nouveau  [     CLK][0000:01:00.0] --: core 405 MHz shader 810 MHz memory 405 MHz
Feb 18 16:03:28 linux-ky6z kernel: nouveau  [     DRM] resuming client object trees...
Feb 18 16:03:28 linux-ky6z kernel: nouveau  [     DRM] resuming display...
Feb 18 16:03:28 linux-ky6z kernel: nouveau  [     DRM] resuming console...

The nouveau lines in the failed resume look like the device is being initialized whereas the nouveau lines in the successful resume appear to be for a device that is resuming from suspension.
Comment 14 Takashi Iwai 2015-02-19 09:11:26 UTC
(In reply to John Leys from comment #13)
> The problem happens pretty much universally now when using pm-hibernate; i
> have not had a successful resume in some time using pm-hibernate.  The
> pm-suspend works fine.
> 
> In doing some research, i discovered that a way to test is to write directly
> to files in the /sys/power directory.  When i do this, the hibernate works
> every time:
> 1. echo "platform" > /sys/power/disk
> 2. echo "disk" /sys/power/state
> 
> The hibernate/thaw works correctly and there does not seem to be any problem
> finding the snapshotted image on the disk used for resume. The number of the
> disk used for snapshotting and restoring is the one stored in
> /sys/power/resume file, and is 8:5, reflecting the number for the resume
> disk, /dev/sda5:
> # ll /dev/sda5
> brw-rw---- 1 root disk 8, 5 Feb 18 16:19 /dev/sda5
> 
> It is not clear whether s2disk is the problem, but i tried both 'uswsusp'
> and 'kernel' methods for hibernate, changing SLEEP_MODULE in
> /usr/lib/pm-utils/defaults, but neither seemed to work; both end up not
> being able to find the hibernation image.
 
Thanks, it's a good find.  This reminds me of a bug in the hooks we had, where fsck is running before thawing.  But I thought this was already fixed.

In anyway, to be sure, test with "systemctl hibernate", too.  This is more official way to perform suspend/resume.  With oS13.2 version, this is almost as pm-utils is present, so it should also fail.

The rest task would be to drop /usr/lib/pm-utils/sleep.d/* and /etc/pm-utils/sleep.d/* to identify which one triggers the problem.
Comment 15 Leys 2015-02-19 18:57:51 UTC
(In reply to Takashi Iwai from comment #14)
> (In reply to John Leys from comment #13)
> > The problem happens pretty much universally now when using pm-hibernate; i
> > have not had a successful resume in some time using pm-hibernate.  The
> > pm-suspend works fine.
> > 
> > In doing some research, i discovered that a way to test is to write directly
> > to files in the /sys/power directory.  When i do this, the hibernate works
> > every time:
> > 1. echo "platform" > /sys/power/disk
> > 2. echo "disk" /sys/power/state
> > 
> > The hibernate/thaw works correctly and there does not seem to be any problem
> > finding the snapshotted image on the disk used for resume. The number of the
> > disk used for snapshotting and restoring is the one stored in
> > /sys/power/resume file, and is 8:5, reflecting the number for the resume
> > disk, /dev/sda5:
> > # ll /dev/sda5
> > brw-rw---- 1 root disk 8, 5 Feb 18 16:19 /dev/sda5
> > 
> > It is not clear whether s2disk is the problem, but i tried both 'uswsusp'
> > and 'kernel' methods for hibernate, changing SLEEP_MODULE in
> > /usr/lib/pm-utils/defaults, but neither seemed to work; both end up not
> > being able to find the hibernation image.
>  
> Thanks, it's a good find.  This reminds me of a bug in the hooks we had,
> where fsck is running before thawing.  But I thought this was already fixed.
> 
> In anyway, to be sure, test with "systemctl hibernate", too.  This is more
> official way to perform suspend/resume.  With oS13.2 version, this is almost
> as pm-utils is present, so it should also fail.
> 
> The rest task would be to drop /usr/lib/pm-utils/sleep.d/* and
> /etc/pm-utils/sleep.d/* to identify which one triggers the problem.

I noticed that a few things in the pm-hibernate execution path and the 'systemctl hibernate' execution path were duplicative, e.g. 'grub-once', which likely needs to be addressed.  It seems that the new hibernation strategy implementation, i.e using systemd, did not adequately address conflicts with the old hibernation strategy implemenation, i.e. using pm-utils.

I tried various things:
1. systemctl hibernate
2. Removing all files in the /var/lib/pm-utils/sleep.d (/etc/pm-utils/sleep.d was empty)
3. Removal of the pm-utils package
4. Revert to kernel 3.16.7-7-desktop (not OBS kernel) with no pm-utils

With 3 and 4 i only used 'systemctl hibernate' to effect hibernation.  Of these both 3 and 4 are very successful, but 3 still has intermittent problems.  With 3 the hibernation occasionally fails to power off the machine at the end of the hibernation and occasionally fails to restore the display on resume, though it successfully restores teh image.  With 4 the process has not had any problems so far, though i have not tested this configuration for a long period of time; i will update the bug if it appears to still have problems.

With options 1 and 2 there was occasional success, but often failures of the same kind as before, i.e. failue in the process of restoring the image.

I noticed that the pm-utils attempted to lock out concurrent hibernation by using 'flock'; i wonder if teh systemd approach did the same thing.  While executing grub-once is idempotent, other actions performed in the processing of hibernation are not necessarily idempotent, so if there are 2 threads performing the same processing, even successively, it could be problematic.

For now, the problem seems to be resolved by just removing the pm-utils package, though YAST keeps on trying to re-install it whenever i go into the software management dialog.  If i continue to have problems, i will update the bug accordingly.
Comment 16 Takashi Iwai 2015-02-19 21:18:21 UTC
(In reply to John Leys from comment #15)
> I noticed that a few things in the pm-hibernate execution path and the
> 'systemctl hibernate' execution path were duplicative, e.g. 'grub-once',
> which likely needs to be addressed.  It seems that the new hibernation
> strategy implementation, i.e using systemd, did not adequately address
> conflicts with the old hibernation strategy implemenation, i.e. using
> pm-utils.

Well, the openSUSE 13.2 systemd has an integration of pm-utils by the own patch.  systemd invokes pm-utils stuff when it's found.  If not, systemd tries to write /sys/power/state.  In addition, systemd has its own hooks in /usr/lib/systemd/system-sleep/*.  This is usually empty as default, though.
 
> I tried various things:
> 1. systemctl hibernate
> 2. Removing all files in the /var/lib/pm-utils/sleep.d
> (/etc/pm-utils/sleep.d was empty)
> 3. Removal of the pm-utils package
> 4. Revert to kernel 3.16.7-7-desktop (not OBS kernel) with no pm-utils
> 
> With 3 and 4 i only used 'systemctl hibernate' to effect hibernation.  Of
> these both 3 and 4 are very successful, but 3 still has intermittent
> problems.  With 3 the hibernation occasionally fails to power off the
> machine at the end of the hibernation and occasionally fails to restore the
> display on resume, though it successfully restores teh image.  With 4 the
> process has not had any problems so far, though i have not tested this
> configuration for a long period of time; i will update the bug if it appears
> to still have problems.

3 and 4 are effectively doing the kernel hibernation (/sys/power/state).
 
> With options 1 and 2 there was occasional success, but often failures of the
> same kind as before, i.e. failue in the process of restoring the image.
> 
> I noticed that the pm-utils attempted to lock out concurrent hibernation by
> using 'flock'; i wonder if teh systemd approach did the same thing.  While
> executing grub-once is idempotent, other actions performed in the processing
> of hibernation are not necessarily idempotent, so if there are 2 threads
> performing the same processing, even successively, it could be problematic.
> 
> For now, the problem seems to be resolved by just removing the pm-utils
> package, though YAST keeps on trying to re-install it whenever i go into the
> software management dialog.  If i continue to have problems, i will update
> the bug accordingly.

Could you try just to remove /usr/lib/pm-utils/sleep.d/99Zgrub?  If it's about pm-utils hooks, this one appears most suspicious.
Comment 17 Leys 2015-02-20 01:53:57 UTC
(In reply to Takashi Iwai from comment #16)
> (In reply to John Leys from comment #15)
> > I noticed that a few things in the pm-hibernate execution path and the
> > 'systemctl hibernate' execution path were duplicative, e.g. 'grub-once',
> > which likely needs to be addressed.  It seems that the new hibernation
> > strategy implementation, i.e using systemd, did not adequately address
> > conflicts with the old hibernation strategy implemenation, i.e. using
> > pm-utils.
> 
> Well, the openSUSE 13.2 systemd has an integration of pm-utils by the own
> patch.  systemd invokes pm-utils stuff when it's found.  If not, systemd
> tries to write /sys/power/state.  In addition, systemd has its own hooks in
> /usr/lib/systemd/system-sleep/*.  This is usually empty as default, though.
>  
> > I tried various things:
> > 1. systemctl hibernate
> > 2. Removing all files in the /var/lib/pm-utils/sleep.d
> > (/etc/pm-utils/sleep.d was empty)
> > 3. Removal of the pm-utils package
> > 4. Revert to kernel 3.16.7-7-desktop (not OBS kernel) with no pm-utils
> > 
> > With 3 and 4 i only used 'systemctl hibernate' to effect hibernation.  Of
> > these both 3 and 4 are very successful, but 3 still has intermittent
> > problems.  With 3 the hibernation occasionally fails to power off the
> > machine at the end of the hibernation and occasionally fails to restore the
> > display on resume, though it successfully restores teh image.  With 4 the
> > process has not had any problems so far, though i have not tested this
> > configuration for a long period of time; i will update the bug if it appears
> > to still have problems.
> 
> 3 and 4 are effectively doing the kernel hibernation (/sys/power/state).
>  
> > With options 1 and 2 there was occasional success, but often failures of the
> > same kind as before, i.e. failue in the process of restoring the image.
> > 
> > I noticed that the pm-utils attempted to lock out concurrent hibernation by
> > using 'flock'; i wonder if teh systemd approach did the same thing.  While
> > executing grub-once is idempotent, other actions performed in the processing
> > of hibernation are not necessarily idempotent, so if there are 2 threads
> > performing the same processing, even successively, it could be problematic.
> > 
> > For now, the problem seems to be resolved by just removing the pm-utils
> > package, though YAST keeps on trying to re-install it whenever i go into the
> > software management dialog.  If i continue to have problems, i will update
> > the bug accordingly.
> 
> Could you try just to remove /usr/lib/pm-utils/sleep.d/99Zgrub?  If it's
> about pm-utils hooks, this one appears most suspicious.

Because of the problems with BTRFS and grub-once, removing 99Zgrub was one of the first things that i did, even before filing this bug, and it was not effective; i also effectively nulled the systemd grub-once and the problem still occurred.  Also, as i pointed out, i tried with nothing in /usr/lib/pm-utils/sleep.d and the problem still occurred.  But 99Zgrub only targets a particular grub.cfg entry for use on the next startup, it does not modify the hibernate image; hibernate image creation and modification occurs after all of the sleep.d modules have been executed.
Comment 18 Leys 2015-03-05 23:40:43 UTC
After removing the pm-utils package, i have been using hibernate/thaw with no problems for 2 weeks.  I am not certain what problem is caused by pm-utils, but removing it appears to be a solution.  I feel that this bug should remain open, however, until the cause of the conflict has been identified, and/or an upgrade path from earlier versions includes a migration of scripts that have been modified by the user from /usr/lib/pm-utils/sleep.d to /usr/lib/systemd/system-sleep/, or some other more user oriented directory that contains scripts to execute on hiblernate/thaw.
Comment 19 Stanislav Brabec 2015-06-25 18:50:59 UTC
Do you have installed suspend package?

If yes, maybe removal of suspend package and keeping pm-utils would work as well.
Comment 20 Kristyna Streitova 2015-11-12 17:01:27 UTC
There was a long discussion about it (see bug 925873), but it seems that having only one way to suspend ('systemctl suspend' or 'systemctl hibernate') is the best way. 

Please note that pm-utils and suspend packages are no more available in Tumbleweed.

I'm closing this bug. Feel free to reopen if there is any new information.