Bug 1055493 - kernel panic after wakeup from STR / on switching terminals / displays after upgrade linux-default-4.4.36-8.1 -> 4.4.49-16, and still happening on 4.4.79-18.26-default
Summary: kernel panic after wakeup from STR / on switching terminals / displays after ...
Status: VERIFIED FIXED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Leap 42.2
Hardware: Other Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on: 1029634
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-24 11:29 UTC by Oliver Kurz
Modified: 2019-08-15 13:28 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Kurz 2017-08-24 11:29:00 UTC
+++ This bug was initially created as a clone of Bug #1029634 +++

## observation

On DELL Latitude E7250 after waking up the system from STR in the docking station and calling `xrandr` to enabled the display connected over DP sometimes the system halts and has a kernel panic. Originally reported as bug #1029634 and thought to be fixed but it still happens sometimes although less than before.


## reproducible

So far I do not know a non-statistical way to reproduce so the best way to reproduce it is so far -> Run the following procedure every day until the problem happens :-/ :

* system in docking station with internal screen and external DP screen on docking station enabled in openSUSE Leap 42.2 in awesome window manager
* work a day
* put system to STR in docking station
* wait for system to be in STR
* close lid
* undock
* bring computer back home
* sometimes switch computer on from home, sometimes not. Switch on, call xrandr and configure only single, internal screen, work, switch off
* next day go to work
* put computer in docking station
* open lid, computer switches on
* wait some seconds or not
* unlock screen
* call xrandr over keyboard shortcut bound to script to call xrandr to enable external screen
* observe problem or if not, repeat whole procedure


## expected results

last good should be linux 4.4.36, at least I do not recall observing this problem in before the last kernel update, at least not that often.

Expected: Obviously the kernel should not crash here.
Comment 1 Oliver Kurz 2017-09-01 06:26:49 UTC
what I found in the bootup log:

```
Sep 01 08:04:42 linux-28d6.suse kernel: WARNING: CPU: 2 PID: 0 at ../drivers/gpu/drm/i915/intel_uncore.c:633 hsw_unclaimed_reg_debug.isra.13+0x6b/0x90 [i915](
Sep 01 08:04:42 linux-28d6.suse kernel: Unclaimed register detected before reading register 0x223a0
Sep 01 08:04:42 linux-28d6.suse kernel: Modules linked in: nf_log_ipv6 xt_pkttype xt_TCPMSS nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet ip6t_REJECT nf
Sep 01 08:04:42 linux-28d6.suse kernel:  intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp videobuf2_v4l2 dell_smm_hwmon videobuf2_core v4l2_common iw
Sep 01 08:04:42 linux-28d6.suse kernel:  jitterentropy_rng drbg ansi_cprng aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci ser
Sep 01 08:04:42 linux-28d6.suse kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.79-18.26-default #1
Sep 01 08:04:42 linux-28d6.suse kernel: Hardware name: Dell Inc. Latitude E7250/0TVD2T, BIOS A07 09/01/2015
Sep 01 08:04:42 linux-28d6.suse kernel:  0000000000000000 ffffffff8132a157 ffff88021e503d58 ffffffffa035f7f0
Sep 01 08:04:42 linux-28d6.suse kernel:  ffffffff8107ef11 ffff880215bb0078 ffff88021e503da8 00000000000223a0
Sep 01 08:04:42 linux-28d6.suse kernel:  ffff880215bb0078 0000000000000046 ffffffff8107ef8c ffffffffa035f850
Sep 01 08:04:42 linux-28d6.suse kernel: Call Trace:
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff81019ea9>] dump_trace+0x59/0x320
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8101b011>] show_stack+0x21/0x40
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8132a157>] dump_stack+0x5c/0x85
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8107ef11>] warn_slowpath_common+0x81/0xb0
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8107ef8c>] warn_slowpath_fmt+0x4c/0x50
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffffa02db6db>] hsw_unclaimed_reg_debug.isra.13+0x6b/0x90 [i915]
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffffa02ddf99>] gen6_read32+0x59/0x1c0 [i915]
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffffa02d1855>] intel_lrc_irq_handler+0x35/0x240 [i915]
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffffa028f9e5>] gen8_gt_irq_handler+0x215/0x240 [i915]
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffffa028fa86>] gen8_irq_handler+0x76/0x650 [i915]
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff810da336>] __handle_irq_event_percpu+0x46/0x1c0
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff810da4d0>] handle_irq_event_percpu+0x20/0x50
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff810da53d>] handle_irq_event+0x3d/0x60
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff810dd6b8>] handle_edge_irq+0x88/0x130
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff81019e39>] handle_irq+0x19/0x30
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff81613408>] do_IRQ+0x48/0xd0
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8161124c>] common_interrupt+0x8c/0x8c
Sep 01 08:04:42 linux-28d6.suse kernel: DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
Sep 01 08:04:42 linux-28d6.suse kernel: 
Sep 01 08:04:42 linux-28d6.suse kernel: Leftover inexact backtrace:
Sep 01 08:04:42 linux-28d6.suse kernel:  <IRQ>  <EOI>  [<ffffffff814d43e7>] ? cpuidle_enter_state+0xd7/0x270
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff814d43c2>] ? cpuidle_enter_state+0xb2/0x270
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff810c3d9d>] ? cpu_startup_entry+0x2ad/0x3a0
Sep 01 08:04:42 linux-28d6.suse kernel:  [<ffffffff8104d6a0>] ? start_secondary+0x150/0x180
```

but I don't think it's the same error.

Could someone please give me a hint how I could debug, gather logs, save kdump, etc.?
Comment 2 Oliver Kurz 2017-09-01 06:28:58 UTC
Installed vanilla kernel for now because I don't trust the SUSE patches anymore. So I will run: kernel-vanilla-4.4.79-18.26.2.x86_64
Comment 3 Takashi Iwai 2017-09-01 07:20:47 UTC
Please try the latest upstream kernel at first, to see whether the problem is present or not.  4.4.x vanilla has a lot of other problems (e.g. Skylake and newer sucks completely), so it won't help anything at all.  What you could say is that it was old-good-day in the past.  The whole DP-MST issues are still there, so you'll hit more or less another problem with 4.4.x vanilla kernel.

Also, it'd be worth to try Leap 42.3 kernel with drm-kmp, which is based on 4.9.x kernel code.

The unclaimed register warning might be a red herring.  It was seen on Broadwell, but doesn't indicate about the crash, merely a sanity check, per se.
The warning might be already addressed in the recent kernel, but it's unlikely related with the DP-MST problem.
Comment 4 Oliver Kurz 2017-09-01 07:57:13 UTC
Thanks for your answer. Well, as reported in the clone source bug https://bugzilla.suse.com/show_bug.cgi?id=1029634 and also in the subject line the behaviour was flawless for my workflow with 4.4.36 so I suspect that either new changes from upstream since then caused the problem or SUSE patches. I suspect the latter but to crosscheck I am running the vanilla kernel which corresponds to the current Leap kernel-default to see if the error appears. I don't want to ignore the bug but help with debugging at least by running the closest kernel where the error might still be present. Doesn't this make sense? If you think it would help *you* more to find a fix I am happy to try a more recent version as well.
Comment 5 Takashi Iwai 2017-09-01 08:22:54 UTC
Testing the latest upstream (i.e. 4.12.x and 4.13-rc) would be more helpful than sticking again with the rusty 4.4.x.

The problem is that DP-MST is broken, and in your case, you had just a luck that casually worked somehow with older kernels.  Now with more fixes on it, the hidden problem surfaced.
Comment 6 Takashi Iwai 2017-09-06 05:31:55 UTC
I still could reproduce the original issue on Dell laptop with Skylake on 4.12 vanilla kernel.  Since the hackish fix patch for SLE12 was removed on SLE15 code base, the issue happened on SLE15, too.

Then I noticed that 4.13 seems working, and the bisection pointed to a few DP-MST patches.  So these fix patches are merged to SLE15 (on the way, now in pull request).

Though, these fixes don't look effective on 4.4.x, especially because only two of four commits are applicable.  In anyway, I prepared a Leap 42.2 test kernel in OBS home:tiwai:bnc1055493 repo.

No matter whether the test kernel works or not, please test the upstream 4.12.x and 4.13 kernels to see whether we're tracking the same problem.  Your chip is different from mine, so it might be triggered by a different cause.
Comment 7 Oliver Kurz 2017-09-12 06:23:10 UTC
(In reply to Takashi Iwai from comment #6)
> No matter whether the test kernel works or not

I was still running the vanilla kernel 4.4.79-18.26.2 until now when I could just reproduce a very similar crash when trying to switch the displays although it seems far less likely to hit that problem then with kernel-default. Also the graphics behave different: After a suspend even when not changing the monitor layout the external screen is more prone to stay dark until I force switching the monitor off and on again, be it over xrandr or when using the physical power button on the screen.

Right now I am running 4.4.85-2.gba575f2-default from your test repo. It could be some time until I hit this problem again as it is hard to reproduce and also it's my work environment where I have to do some other work to do than just kernel graphics stack testing so please be patient :-)

 please test the upstream
> 4.12.x and 4.13 kernels to see whether we're tracking the same problem. 

If/when my system crashes again with above's 4.4 test kernel I will check with the latest current kernel
kernel-default-4.13.0-2.1.g7e9e30a.x86_64
from Kernel:HEAD (I assume this is the one you mean with upstream.
Comment 8 Oliver Kurz 2017-09-29 05:30:41 UTC
For the past 17 days I have been running 4.13.1-1.gc0b7e1f-default from Kernel:HEAD and did not encounter the usual problems. By playing around heavily with switching back and forth between different screens activated, connected, disconnected, etc. I could induce a single crash which seemed to be related but not at all with the same probability of appearance as before.
Comment 9 Swamp Workflow Management 2017-10-17 13:10:29 UTC
openSUSE-SU-2017:2739-1: An update that solves four vulnerabilities and has 15 fixes is now available.

Category: security (important)
Bug References: 1012382,1022967,1052593,1055493,1055755,1055896,1058038,1058410,1058507,1059051,1059465,1060197,1061017,1061046,1061064,1061067,1061172,1061831,1061872
CVE References: CVE-2017-1000252,CVE-2017-12153,CVE-2017-12154,CVE-2017-14489
Sources used:
openSUSE Leap 42.2 (src):    kernel-debug-4.4.90-18.32.1, kernel-default-4.4.90-18.32.1, kernel-docs-4.4.90-18.32.2, kernel-obs-build-4.4.90-18.32.1, kernel-obs-qa-4.4.90-18.32.1, kernel-source-4.4.90-18.32.1, kernel-syms-4.4.90-18.32.1, kernel-vanilla-4.4.90-18.32.1
Comment 10 Swamp Workflow Management 2017-10-17 13:18:16 UTC
openSUSE-SU-2017:2741-1: An update that solves four vulnerabilities and has 33 fixes is now available.

Category: security (important)
Bug References: 1005778,1005780,1005781,1012382,1022967,1036215,1036737,1037579,1037890,1043598,1044503,1047238,1051987,1052593,1053043,1055493,1055755,1056686,1057383,1057498,1058038,1058410,1058507,1058512,1058550,1059051,1059465,1059500,1060197,1060229,1061017,1061046,1061064,1061067,1061172,1061831,1061872
CVE References: CVE-2017-1000252,CVE-2017-12153,CVE-2017-12154,CVE-2017-14489
Sources used:
openSUSE Leap 42.3 (src):    kernel-debug-4.4.90-28.1, kernel-default-4.4.90-28.1, kernel-docs-4.4.90-28.2, kernel-obs-build-4.4.90-28.1, kernel-obs-qa-4.4.90-28.1, kernel-source-4.4.90-28.1, kernel-syms-4.4.90-28.1, kernel-vanilla-4.4.90-28.1
Comment 11 Oliver Kurz 2017-10-20 09:24:09 UTC
Hi Ralf (runger@suse.com), this is the bug I mentioned to you. To me it looks like a related problem.

The workaround that works for me was to install the latest "kernel-default" from the OBS project Kernel:HEAD. To do that you can follow the following steps in a root terminal:

```
zypper ar https://download.opensuse.org/repositories/Kernel:/HEAD/standard/Kernel:HEAD.repo
zypper in -r Kernel_HEAD kernel-default
```
Comment 12 Swamp Workflow Management 2017-10-25 13:31:32 UTC
SUSE-SU-2017:2847-1: An update that solves 11 vulnerabilities and has 170 fixes is now available.

Category: security (important)
Bug References: 1004527,1005776,1005778,1005780,1005781,1012382,1012829,1015342,1015343,1019675,1019680,1019695,1019699,1020412,1020645,1020657,1020989,1021424,1022595,1022604,1022743,1022912,1022967,1024346,1024373,1024405,1025461,1030850,1031717,1031784,1032150,1034048,1034075,1035479,1036060,1036215,1036737,1037579,1037838,1037890,1038583,1040813,1042847,1043598,1044503,1046529,1047238,1047487,1047989,1048155,1048228,1048325,1048327,1048356,1048501,1048893,1048912,1048934,1049226,1049272,1049291,1049336,1049361,1049580,1050471,1050742,1051790,1051987,1052093,1052094,1052095,1052360,1052384,1052580,1052593,1052888,1053043,1053309,1053472,1053627,1053629,1053633,1053681,1053685,1053802,1053915,1053919,1054082,1054084,1054654,1055013,1055096,1055272,1055290,1055359,1055493,1055567,1055709,1055755,1055896,1055935,1055963,1056061,1056185,1056230,1056261,1056427,1056587,1056588,1056596,1056686,1056827,1056849,1056982,1057015,1057031,1057035,1057038,1057047,1057067,1057383,1057498,1057849,1058038,1058116,1058135,1058410,1058507,1058512,1058550,1059051,1059465,1059500,1059863,1060197,1060229,1060249,1060400,1060985,1061017,1061046,1061064,1061067,1061172,1061451,1061721,1061775,1061831,1061872,1062279,1062520,1062962,1063102,1063349,1063460,1063475,1063479,1063501,1063509,1063520,1063570,1063667,1063671,1063695,1064064,1064206,1064388,1064436,963575,964944,966170,966172,966186,966191,966316,966318,969476,969477,969756,971975,981309
CVE References: CVE-2017-1000252,CVE-2017-11472,CVE-2017-12134,CVE-2017-12153,CVE-2017-12154,CVE-2017-13080,CVE-2017-14051,CVE-2017-14106,CVE-2017-14489,CVE-2017-15265,CVE-2017-15649
Sources used:
SUSE Linux Enterprise Workstation Extension 12-SP3 (src):    kernel-default-4.4.92-6.18.1
SUSE Linux Enterprise Software Development Kit 12-SP3 (src):    kernel-docs-4.4.92-6.18.3, kernel-obs-build-4.4.92-6.18.1
SUSE Linux Enterprise Server 12-SP3 (src):    kernel-default-4.4.92-6.18.1, kernel-source-4.4.92-6.18.1, kernel-syms-4.4.92-6.18.1
SUSE Linux Enterprise Live Patching 12-SP3 (src):    kgraft-patch-SLE12-SP3_Update_4-1-4.3
SUSE Linux Enterprise High Availability 12-SP3 (src):    kernel-default-4.4.92-6.18.1
SUSE Linux Enterprise Desktop 12-SP3 (src):    kernel-default-4.4.92-6.18.1, kernel-source-4.4.92-6.18.1, kernel-syms-4.4.92-6.18.1
Comment 13 Swamp Workflow Management 2017-10-27 16:48:24 UTC
SUSE-SU-2017:2869-1: An update that solves 16 vulnerabilities and has 120 fixes is now available.

Category: security (important)
Bug References: 1006180,1011913,1012382,1012829,1013887,1019151,1020645,1020657,1021424,1022476,1022743,1022967,1023175,1024405,1028173,1028286,1029693,1030552,1030850,1031515,1031717,1031784,1033587,1034048,1034075,1034762,1036303,1036632,1037344,1037404,1037994,1038078,1038583,1038616,1038792,1039915,1040307,1040351,1041958,1042286,1042314,1042422,1042778,1043652,1044112,1044636,1045154,1045563,1045922,1046682,1046821,1046985,1047027,1047048,1047096,1047118,1047121,1047152,1047277,1047343,1047354,1047487,1047651,1047653,1047670,1048155,1048221,1048317,1048891,1048893,1048914,1048934,1049226,1049483,1049486,1049580,1049603,1049645,1049882,1050061,1050188,1051022,1051059,1051239,1051399,1051478,1051479,1051556,1051663,1051790,1052049,1052223,1052533,1052580,1052593,1052709,1052773,1052794,1052888,1053117,1053802,1053915,1053919,1054084,1055013,1055096,1055359,1055493,1055755,1055896,1056261,1056588,1056827,1056982,1057015,1058038,1058116,1058410,1058507,1059051,1059465,1060197,1061017,1061046,1061064,1061067,1061172,1061831,1061872,1063667,1064206,1064388,964063,971975,974215,981309
CVE References: CVE-2017-1000252,CVE-2017-10810,CVE-2017-11472,CVE-2017-11473,CVE-2017-12134,CVE-2017-12153,CVE-2017-12154,CVE-2017-13080,CVE-2017-14051,CVE-2017-14106,CVE-2017-14489,CVE-2017-15649,CVE-2017-7518,CVE-2017-7541,CVE-2017-7542,CVE-2017-8831
Sources used:
SUSE Linux Enterprise Workstation Extension 12-SP2 (src):    kernel-default-4.4.90-92.45.1
SUSE Linux Enterprise Software Development Kit 12-SP2 (src):    kernel-docs-4.4.90-92.45.3, kernel-obs-build-4.4.90-92.45.1
SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src):    kernel-default-4.4.90-92.45.1, kernel-source-4.4.90-92.45.1, kernel-syms-4.4.90-92.45.1
SUSE Linux Enterprise Server 12-SP2 (src):    kernel-default-4.4.90-92.45.1, kernel-source-4.4.90-92.45.1, kernel-syms-4.4.90-92.45.1
SUSE Linux Enterprise Live Patching 12 (src):    kgraft-patch-SLE12-SP2_Update_14-1-2.4
SUSE Linux Enterprise High Availability 12-SP2 (src):    kernel-default-4.4.90-92.45.1
SUSE Linux Enterprise Desktop 12-SP2 (src):    kernel-default-4.4.90-92.45.1, kernel-source-4.4.90-92.45.1, kernel-syms-4.4.90-92.45.1
SUSE Container as a Service Platform ALL (src):    kernel-default-4.4.90-92.45.1
OpenStack Cloud Magnum Orchestration 7 (src):    kernel-default-4.4.90-92.45.1
Comment 14 Swamp Workflow Management 2017-12-12 14:10:42 UTC
SUSE-SU-2017:3267-1: An update that solves 5 vulnerabilities and has 56 fixes is now available.

Category: security (important)
Bug References: 1012382,1017461,1020645,1022595,1022600,1022914,1022967,1025461,1028971,1030061,1034048,1037890,1052593,1053919,1055493,1055567,1055755,1055896,1056427,1058135,1058410,1058624,1059051,1059465,1059863,1060197,1060985,1061017,1061046,1061064,1061067,1061172,1061451,1061831,1061872,1062520,1062962,1063460,1063475,1063501,1063509,1063520,1063667,1063695,1064206,1064388,1064701,964944,966170,966172,966186,966191,966316,966318,969474,969475,969476,969477,971975,974590,996376
CVE References: CVE-2017-12153,CVE-2017-13080,CVE-2017-14489,CVE-2017-15265,CVE-2017-15649
Sources used:
SUSE Linux Enterprise Real Time Extension 12-SP2 (src):    kernel-rt-4.4.95-21.1, kernel-rt_debug-4.4.95-21.1, kernel-source-rt-4.4.95-21.1, kernel-syms-rt-4.4.95-21.1
Comment 15 Jiri Slaby 2018-02-14 07:51:44 UTC
Leap 42.2 is out of maintenance. If you see it with later products, please reopen with appropriate product changes.
Comment 16 Oliver Kurz 2018-02-14 16:09:04 UTC
The problem persisted on openSUSE Leap 42.3 but was only initially reported against openSUSE Leap 42.2. IMHO it's a not a good idea to simply close still open bugs when they have been originally reported against an older distribution version that is not supported anymore per se.

Since some weeks I am running 4.4.104-39-default or 4.14.0-1.gc6cd519-default and did not encounter this problem anymore so I assume that actually something fixed it in a kernel upgrade.
Comment 17 Swamp Workflow Management 2018-02-21 17:17:18 UTC
SUSE-SU-2018:0509-1: An update that solves one vulnerability and has 8 fixes is now available.

Category: security (moderate)
Bug References: 1041744,1046821,1047277,1047729,1048155,1050256,1055493,1066175,1077885
CVE References: CVE-2017-10810
Sources used:
SUSE Linux Enterprise Workstation Extension 12-SP3 (src):    drm-4.9.33-4.11.1
SUSE Linux Enterprise Desktop 12-SP3 (src):    drm-4.9.33-4.11.1
Comment 18 Swamp Workflow Management 2018-03-20 12:30:11 UTC
This is an autogenerated message for OBS integration:
This bug (1055493) was mentioned in
https://build.opensuse.org/request/show/589148 42.3 / drm
Comment 19 Swamp Workflow Management 2018-03-23 11:08:19 UTC
openSUSE-RU-2018:0782-1: An update that has 6 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1041744,1047277,1047729,1055493,1066175,1077885
CVE References: 
Sources used:
openSUSE Leap 42.3 (src):    drm-4.9.33-10.2