Bug 1094780

Summary: Laptop with Intel+Nvidia hybrid graphics won't suspend after hibernation
Product: [openSUSE] openSUSE Distribution Reporter: Iakov Karpov <srid>
Component: KernelAssignee: Takashi Iwai <tiwai>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bjoernv, srid, tiwai
Version: Leap 15.0   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Log of kernel-vanilla + drm.debug=0x0e
Log of kernel-vanilla + nouveau.debug=debug
Log of kernel-desktop from home:tiwai:bsc1094751 repo
Log of kernel-default form Kernel:stable repo
Error message of kernel 4.12.14-lp150.12.13-default
kernel error message

Description Iakov Karpov 2018-05-27 17:49:58 UTC
Created attachment 771476 [details]
Log of kernel-vanilla + drm.debug=0x0e

A laptop with hybrid graphics won't suspend after it was hibernated and restored. It may not happen on the fist time, but the main idea is that laptop stops suspending to ram at some point if it was hibernated at least one time. Also, it won't shut down gracefully, there are some error messages about nouveau errors and stalled CPU cores.

I'm using kernel-vanilla, as kernel-default still has bug 1094751, but if you apply fix proposed in that report, kernel-default behaves identically to kernel-vanilla.
Comment 1 Takashi Iwai 2018-06-01 13:53:34 UTC
Could you check the kernel in OBS home:tiwai:bsc1094751 repo?
Does it still cause the issue?
Comment 2 Iakov Karpov 2018-06-01 16:08:39 UTC
Created attachment 772162 [details]
Log of kernel-vanilla + nouveau.debug=debug

(In reply to Takashi Iwai from comment #1)
> Could you check the kernel in OBS home:tiwai:bsc1094751 repo?
> Does it still cause the issue?

I did check with home:tiwai:bsc1094751 repo, it had that issue, and vanilla has this problem too, so I think it's a different problem than bug 1094751

There is no blank screens or anything, laptop would just fail to suspend after you wake it up from hibernation. I'm not sure how to reproduce it reliably though. I only know it happens after hibernation.
Comment 3 Takashi Iwai 2018-06-01 16:18:56 UTC
Could you give the kernel messages with the kernel from home:tiwai:bsc1094751, too?
Comment 4 Takashi Iwai 2018-06-01 16:26:47 UTC
Also, are you using docking station?

The symptom appears similar as the upstream bug
  https://bugs.freedesktop.org/show_bug.cgi?id=90682

It mentioned about DP-MST.
Comment 5 Iakov Karpov 2018-06-01 16:37:41 UTC
Created attachment 772165 [details]
Log of kernel-desktop from home:tiwai:bsc1094751 repo

(In reply to Takashi Iwai from comment #3)
> Could you give the kernel messages with the kernel from
> home:tiwai:bsc1094751, too?

Sure thing. I have figured it refuses to suspend on the second time after hibernation. And just in case, I tried starting kernel with no video=VGA-2:d pcie_aspm=force parameters, no effect.


(In reply to Takashi Iwai from comment #4)
> Also, are you using docking station?
> 
> The symptom appears similar as the upstream bug
>   https://bugs.freedesktop.org/show_bug.cgi?id=90682
> 
> It mentioned about DP-MST.

There are no docking station for that model. But for some reason it has phantom VGA port attached to nvidia card, that's why I have to use video=VGA-2:d
Comment 6 Takashi Iwai 2018-06-03 07:11:08 UTC
Thanks, it shows the very same code path, so it's the same crash consistently.

FWIW, the situation is like below.

An error was seen before the Oops happened at suspending at the second time:
  nouveau 0000:01:00.0: DRM: suspending display...
  nouveau 0000:01:00.0: DRM: evicting buffers...
  nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
  nouveau 0000:01:00.0: fifo: PBDMA0: 00008000 [] ch 0 [003fe12000 DRM] subc 0 mthd 0000 data 00000000
  nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]

The driver returned back and tried to resume again:

  nouveau 0000:01:00.0: DRM: resuming display...
  nouveau 0000:01:00.0: invalid power transition (from state 4 to 3)

... and the power state change was inconsistent.

Now, suspending again, and failed with "channel 0" (but no fifo error message at this time).  Resuming again, though, caused an Oops.

  nouveau 0000:01:00.0: DRM: suspending console...
  nouveau 0000:01:00.0: DRM: suspending display...
  nouveau 0000:01:00.0: DRM: evicting buffers...
  nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
  nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
  nouveau 0000:01:00.0: DRM: resuming display...
  BUG: unable to handle kernel paging request at ffff8805393d8ffc
  IP: evo_wait+0x56/0x120 [nouveau]
  ....
Comment 7 Takashi Iwai 2018-06-03 07:14:10 UTC
And I forget whether you've already test the recent upstream kernel, e.g. the kernel in OBS Kernel:stable repo.  Did you try that already?

If the issue happens with 4.16.x, it should be reported to upstream.
e.g. bugzilla.freedesktop.org category DRI/Nouveau.
Feel free to put me (tiwai@suse.de) there.
Comment 8 Iakov Karpov 2018-06-03 14:49:14 UTC
Created attachment 772211 [details]
Log of kernel-default form Kernel:stable repo

(In reply to Takashi Iwai from comment #7)
> And I forget whether you've already test the recent upstream kernel, e.g.
> the kernel in OBS Kernel:stable repo.  Did you try that already?
> 
> If the issue happens with 4.16.x, it should be reported to upstream.
> e.g. bugzilla.freedesktop.org category DRI/Nouveau.
> Feel free to put me (tiwai@suse.de) there.

I tried the kernel:stable repo, laptop still won't suspend, but kernel log is a little bit different this time. However, I'm not sure if I can make a legit report  for upstream, because kernel-vanilla 4.16 fails to resume from hibernation for some reason.
Comment 9 Takashi Iwai 2018-06-03 15:14:13 UTC
It's fine to report to upstream as long as the kernel-default in Kernel:stable also shows the issue.  TW kernel (i.e. the one in Kernel:stable) has very few backport patches, and very close to the upstream as is.

4.16.x kernel seems showing the very same symptom; it gets a fifo PBDMA0 error (although it shows repeatedly), then Oops at resume in evo_wait().

So, it'd be really better to report to upstream.  You can give the kernel messages booted with drm.debug=0x0e as well, which shows more debug messages.
Comment 10 Iakov Karpov 2018-08-14 10:42:56 UTC
Created attachment 779681 [details]
Error message of kernel 4.12.14-lp150.12.13-default

Still happening with latest openSUSE Leap 15.0 kernel. Also, I noticed that after that nvidia chip won't shut down until you unplug all power sources (battery and AC adapter). Upstream bug (https://bugs.freedesktop.org/show_bug.cgi?id=106795) never got a reply.
Comment 11 Takashi Iwai 2019-07-05 13:44:25 UTC
Could you check whether Leap 15.1 works in this regard?  It should have been addressed there.
Comment 12 Iakov Karpov 2019-07-05 14:46:53 UTC
(In reply to Takashi Iwai from comment #11)
> Could you check whether Leap 15.1 works in this regard?  It should have been
> addressed there.

It does not work. In fact, it got worse: dmesg is flooded with nouveau error messages like this:

[  154.861853] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861868] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861883] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861898] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861913] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861928] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861943] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861958] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861973] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000
[  154.861989] nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [] ch 2 [003fd71000 X[5243]] subc 0 mthd 0000 data 00000000

In a couple of minutes I got a gigabyte of those.
Comment 13 Takashi Iwai 2019-07-05 14:54:15 UTC
Then it's very likely an upstream problem, as Leap 15.1 already got most of updates for nouveau.
To be sure, try the 5.1.x kernel from Kernel:stable repo.  If this still doesn't work, continue on the upstream bug tracker, bugs.freedesktop.org.
If 5.1 kernel works, we may still have some chance for a fix backport to Leap 15.1.
Comment 14 Iakov Karpov 2019-07-05 15:17:45 UTC
Created attachment 809571 [details]
kernel error message

(In reply to Takashi Iwai from comment #13)
> Then it's very likely an upstream problem, as Leap 15.1 already got most of
> updates for nouveau.
> To be sure, try the 5.1.x kernel from Kernel:stable repo.  If this still
> doesn't work, continue on the upstream bug tracker, bugs.freedesktop.org.
> If 5.1 kernel works, we may still have some chance for a fix backport to
> Leap 15.1.

5.1.16 is also affected, however it's a little bit more informative (see attachment).
I've reported this upstream a year ago, but they seem to ignore it.
Comment 15 Takashi Iwai 2019-07-05 15:28:56 UTC
Just try to ping again the upstream bug tracker.  Developers are overloaded, but active reports give more chance for the attention.
Comment 16 Takashi Iwai 2020-05-15 12:06:07 UTC
Could you check whether the problem persists with Leap 15.2?  It goes to 5.3 kernel (with a bunch of backports), so we have a better chance now.
Comment 17 Iakov Karpov 2020-05-15 16:12:14 UTC
(In reply to Takashi Iwai from comment #16)
> Could you check whether the problem persists with Leap 15.2?  It goes to 5.3
> kernel (with a bunch of backports), so we have a better chance now.

I've installed 15.2 kernel on top of my 15.1, problem still persists.
Comment 18 Takashi Iwai 2020-05-15 17:02:25 UTC
It's a pity that nouveau isn't maintained well enough in many aspects.

Could you try the 5.6.y kernel?  Just to make sure.
Comment 19 Iakov Karpov 2020-05-15 17:29:37 UTC
(In reply to Takashi Iwai from comment #18)
> It's a pity that nouveau isn't maintained well enough in many aspects.
> 
> Could you try the 5.6.y kernel?  Just to make sure.

5.6.13 doesn't work either
Comment 20 Takashi Iwai 2024-03-07 14:29:29 UTC
Since this is an old distro issue, I close now as WONTFIX.
If you have the same problem with the latest openSUSE versions, please reopen.  Thanks.