Bug 1077885

Summary: GPU hang (Intel Mobile 4 Series Integrated Graphics Controller)
Product: [openSUSE] openSUSE Distribution Reporter: Carlos Robinson <carlos.e.r>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: carlos.e.r, davejplater, pawel.dziekonski, sndirsch, tiwai
Version: Leap 42.3   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 42.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: CER: Messages log
CER: gpu log
CER: hwinfo output

Description Carlos Robinson 2018-01-28 22:05:46 UTC
This is similar to "Bug 1050256 - GPU hang", but different GPU. The symptoms are the same, but being different GPU I was told to create new report.


I have this issue after upgrading my laptop to 42.3 from 42.2, using the offline or DVD upgrade method.

CPU:
  Model: 6.23.10 "Pentium(R) Dual-Core CPU       T4300  @ 2.10GHz"
Video:
  Model: "Intel Mobile 4 Series Chipset Integrated Graphics Controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x2a42 "Mobile 4 Series Chipset Integrated Graphics Controller"
  SubVendor: pci 0x103c "Hewlett-Packard Company"
  SubDevice: pci 0x3069 
  Revision: 0x07
  Driver: "i915"
  Driver Modules: "i915"

(hwinfo output will be attached)

Crash log:

<3.6> 2018-01-27 12:47:05 minas-tirith systemd 1 - -  Started Postfix Mail Transport Agent.
<0.6> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808879] [drm] GPU HANG: ecode 4:0:0xfdefffff, in X [2154], reason: Hang on render ring, action: reset
<0.6> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808883] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<0.6> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808884] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<0.6> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808884] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<0.6> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808885] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<0.6> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808885] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<0.5> 2018-01-27 12:47:17 minas-tirith kernel - - - [ 1128.808914] drm/i915: Resetting chip after gpu hang
<0.5> 2018-01-27 12:47:26 minas-tirith kernel - - - [ 1137.820965] drm/i915: Resetting chip after gpu hang
<0.5> 2018-01-27 12:47:36 minas-tirith kernel - - - [ 1147.820140] drm/i915: Resetting chip after gpu hang


I commented this on the openSUSE mail list, and Dave Plater suggested nomodeset. This works, but the video mode changes to something like 800*600, which is pretty bad. He also suggested to reopen this Bugzilla.

At that moment I had kernel 4.4.104-39, and drm-kmp-default 4.9.33_k4.4.79_4-5.2. I updated to his version, drm-kmp-default-4.9.33_k4.4.104_39-7.24.x86_64.rpm; this is more stable, but in the end the X environment froze: mouse moves, but no response. I could ctrl-alt-f1. 


I see in the log several entries like this (different PID), don't know if related:

<3.6> 2018-01-27 19:58:34 minas-tirith console-kit-daemon 3128 - -  (process:10750): GLib-CRITICAL **: g_slice_set_config: assertion 'sys_page_size == 0' failed


I hibernated the machine and went back home. Restored (not restarted) and I see this in the log:


<3.6> 2018-01-27 21:16:36 minas-tirith systemd-sleep 10886 - -  System resumed.
<3.6> 2018-01-27 21:16:36 minas-tirith systemd-sleep 10886 - -  INFO: running /usr/lib/systemd/system-sleep/grub2.sleep for hibernate
<3.6> 2018-01-27 21:16:36 minas-tirith systemd-sleep 10886 - -  INFO: Running grub-once-restore ..
<3.6> 2018-01-27 21:16:36 minas-tirith systemd-sleep 10886 - -  2018-01-27 21:16:36+01:00 - Thawing the system now...
<3.4> 2018-01-27 21:16:36 minas-tirith systemd-sh - - -  Thawing the system now...
<3.6> 2018-01-27 21:16:37 minas-tirith systemd 1 - -  Stopped Deferred execution scheduler.
<3.6> 2018-01-27 21:16:37 minas-tirith systemd 1 - -  Started Deferred execution scheduler.
<3.6> 2018-01-27 21:16:37 minas-tirith laptop-mode - - -  Laptop mode
<3.6> 2018-01-27 21:16:37 minas-tirith laptop-mode - - -  enabled, not active [unchanged]
<3.6> 2018-01-27 21:16:37 minas-tirith systemd-sleep 10886 - -  INFO: Done.
<3.6> 2018-01-27 21:16:37 minas-tirith laptop-mode - - -  Laptop mode
<3.6> 2018-01-27 21:16:37 minas-tirith laptop-mode - - -  enabled, not active [unchanged]
<3.6> 2018-01-27 21:16:37 minas-tirith systemd-sleep 10886 - -  tput: No value for $TERM and no -T specified
<0.6> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816731] [drm] GPU HANG: ecode 4:0:0xfdeffdfb, in X [2171], reason: Hang on render ring, action: reset
<0.6> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816736] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<0.6> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816736] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<0.6> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816737] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<0.6> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816737] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<0.6> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816738] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<0.5> 2018-01-27 21:16:48 minas-tirith kernel - - - [13685.816792] drm/i915: Resetting chip after gpu hang
<0.5> 2018-01-27 21:17:00 minas-tirith kernel - - - [13697.816112] drm/i915: Resetting chip after gpu hang


I will attach gpu.2.log, and messages log since machine upgrade, and hwinfo --cpu and --gfxcard


My desktop is XFCE and I have 4 GiB of RAM.
Comment 1 Carlos Robinson 2018-01-28 22:06:24 UTC
Created attachment 757825 [details]
CER: Messages log
Comment 2 Carlos Robinson 2018-01-28 22:08:32 UTC
Created attachment 757826 [details]
CER: gpu log
Comment 3 Carlos Robinson 2018-01-28 22:08:58 UTC
Created attachment 757827 [details]
CER: hwinfo output
Comment 4 Carlos Robinson 2018-01-28 22:09:30 UTC
On suggestion from Felix Miata I add inxi output:

minas-tirith:/home/cer/Bugzilla/Bug_1050256 - GPU hang # inxi -c0 -G
Graphics:  Card: Intel Mobile 4 Series Integrated Graphics Controller
           Display Server: X.org 1.18.3 drivers: intel (unloaded: modesetting,fbdev,vesa)
           tty size: 150x51 Advanced Data: N/A for root
minas-tirith:/home/cer/Bugzilla/Bug_1050256 - GPU hang #
Comment 5 Carlos Robinson 2018-01-28 22:13:35 UTC
On suggestion from  Stefan Dirsch I have uninstalled drm-kmp-default, I will see what happens.
Comment 6 Carlos Robinson 2018-01-29 12:00:39 UTC
I see a needinfo from me, but I don't see the question. :-?
Clearing.
Comment 7 Stefan Dirsch 2018-01-29 14:01:38 UTC
Well, question is in your own comment #5. ;-)
Comment 8 Carlos Robinson 2018-01-29 14:58:33 UTC
Ah, ok :-)

So far, no crashes (I left the machine running all night while I slept, and the display artefacts have disappeared.

I will now hibernate and restore the machine, this usually causes some stress.
[...]
Restored fine, it seems.

I can try rebooting with reduced memory.
[...]
Ok, did so, booted with 1G, opened thunderbird and firefox, machine was swapping about another gig, alt-tabbed, switched workspaces, and no artifacts, no crashes.

So this machine should run without drm-kmp-default always? Or a patch is needed?
Comment 9 Stefan Dirsch 2018-01-30 16:08:32 UTC
(In reply to Carlos Robinson from comment #8)
> So this machine should run without drm-kmp-default always? Or a patch is
> needed?

Yes, that's probably best. In addition you can try KOTD to see if the issue has been fixed upstream meanwhile.
Comment 10 Carlos Robinson 2018-01-30 19:54:06 UTC
Well, I'll see if I can. Means also installing corresponding drm-kmp- too, I guess.

I also have to try installing Leap 15.0 in a test partition and report.

Thanks.
Comment 11 Stefan Dirsch 2018-01-30 20:17:41 UTC
(In reply to Carlos Robinson from comment #10)
> Well, I'll see if I can. Means also installing corresponding drm-kmp- too, I
> guess.

Oh no. *Un*installing, please!

> I also have to try installing Leap 15.0 in a test partition and report.

That's also useful. Thanks!
Comment 12 Carlos Robinson 2018-01-30 20:52:04 UTC
I don't understand. The crash doesn't happen unless I install drm-kmp, there will be no way to know when the kernel solves the issue.
Comment 13 Stefan Dirsch 2018-01-30 21:05:26 UTC
? drm-kmp means DRM drivers from Kernel 4.9. I would like to know whether newer Kernels 4.14/4.15 refix the issue. We know DRM of Kernel 4.4 still worked.
Comment 14 Pawel Dziekonski 2018-02-09 17:40:53 UTC
I have exactly the same problem after update to 43.2 

  Device Name: "Onboard IGD"
  Model: "Intel Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x0412 "Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller"
  SubVendor: pci 0x1462 "Micro-Star International Co., Ltd. [MSI]"
  SubDevice: pci 0x7817 
  Revision: 0x06
  Driver: "i915"
  Driver Modules: "drm"

CPU:
Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz

uname -r
4.4.104-39-default

The only way to overcome this is to 
zypper addlock drm-kmp-default
:(
Comment 15 Takashi Iwai 2018-02-14 11:31:40 UTC
Since the old Intel chips don't work with 4.9.x kernel any better than 4.4.x, let's apply the limited supplements to drm-kmp on Leap 42.3 as we do for SLE12-SP3.  It won't let it uninstalled automatically, but it can help a bit -- you can remove the zypper lock, at least.
Comment 17 Swamp Workflow Management 2018-02-21 17:18:05 UTC
SUSE-SU-2018:0509-1: An update that solves one vulnerability and has 8 fixes is now available.

Category: security (moderate)
Bug References: 1041744,1046821,1047277,1047729,1048155,1050256,1055493,1066175,1077885
CVE References: CVE-2017-10810
Sources used:
SUSE Linux Enterprise Workstation Extension 12-SP3 (src):    drm-4.9.33-4.11.1
SUSE Linux Enterprise Desktop 12-SP3 (src):    drm-4.9.33-4.11.1
Comment 18 Carlos Robinson 2018-03-12 00:33:34 UTC
Tested with openSUSE-Leap-15.0-DVD-x86_64-Build153.1-Media.iso

Minas-Anor:~ # rpm -q drm-kmp-default
package drm-kmp-default is not installed

and the machine seems to work perfectly.


On Leap 42.3, however, the package was automatically reinstalled by YaST online update. I noticed the artifacts and found the package installed. I had to taboo it.
Comment 19 Swamp Workflow Management 2018-03-19 16:40:10 UTC
This is an autogenerated message for OBS integration:
This bug (1077885) was mentioned in
https://build.opensuse.org/request/show/588685 42.3 / drm
Comment 20 Swamp Workflow Management 2018-03-20 12:30:39 UTC
This is an autogenerated message for OBS integration:
This bug (1077885) was mentioned in
https://build.opensuse.org/request/show/589148 42.3 / drm
Comment 21 Swamp Workflow Management 2018-03-23 11:08:47 UTC
openSUSE-RU-2018:0782-1: An update that has 6 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1041744,1047277,1047729,1055493,1066175,1077885
CVE References: 
Sources used:
openSUSE Leap 42.3 (src):    drm-4.9.33-10.2
Comment 22 Stefan Dirsch 2018-03-26 14:13:43 UTC
The updated drm-kmp-default package for Leap 42.3 no longer will be (re-)installed automatically for older Intel GPUs. Hardwarre Supplements in the package have been adjusted. So let's close this as fixed.