|
Bugzilla – Full Text Bug Listing |
| Summary: | kernel panic after wakeup from STR / on switching terminals / displays after upgrade linux-default-4.4.36-8.1 -> 4.4.49-16, and still happening on 4.4.79-18.26-default | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Oliver Kurz <okurz> |
| Component: | Kernel | Assignee: | E-mail List <kernel-maintainers> |
| Status: | VERIFIED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | jslaby, okurz, runger, sebastian.chlad, tiwai |
| Version: | Leap 42.2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | 1029634 | ||
| Bug Blocks: | |||
|
Description
Oliver Kurz
2017-08-24 11:29:00 UTC
what I found in the bootup log: ``` Sep 01 08:04:42 linux-28d6.suse kernel: WARNING: CPU: 2 PID: 0 at ../drivers/gpu/drm/i915/intel_uncore.c:633 hsw_unclaimed_reg_debug.isra.13+0x6b/0x90 [i915]( Sep 01 08:04:42 linux-28d6.suse kernel: Unclaimed register detected before reading register 0x223a0 Sep 01 08:04:42 linux-28d6.suse kernel: Modules linked in: nf_log_ipv6 xt_pkttype xt_TCPMSS nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet ip6t_REJECT nf Sep 01 08:04:42 linux-28d6.suse kernel: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp videobuf2_v4l2 dell_smm_hwmon videobuf2_core v4l2_common iw Sep 01 08:04:42 linux-28d6.suse kernel: jitterentropy_rng drbg ansi_cprng aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci ser Sep 01 08:04:42 linux-28d6.suse kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.79-18.26-default #1 Sep 01 08:04:42 linux-28d6.suse kernel: Hardware name: Dell Inc. Latitude E7250/0TVD2T, BIOS A07 09/01/2015 Sep 01 08:04:42 linux-28d6.suse kernel: 0000000000000000 ffffffff8132a157 ffff88021e503d58 ffffffffa035f7f0 Sep 01 08:04:42 linux-28d6.suse kernel: ffffffff8107ef11 ffff880215bb0078 ffff88021e503da8 00000000000223a0 Sep 01 08:04:42 linux-28d6.suse kernel: ffff880215bb0078 0000000000000046 ffffffff8107ef8c ffffffffa035f850 Sep 01 08:04:42 linux-28d6.suse kernel: Call Trace: Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8132a157>] dump_stack+0x5c/0x85 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8107ef11>] warn_slowpath_common+0x81/0xb0 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8107ef8c>] warn_slowpath_fmt+0x4c/0x50 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffffa02db6db>] hsw_unclaimed_reg_debug.isra.13+0x6b/0x90 [i915] Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffffa02ddf99>] gen6_read32+0x59/0x1c0 [i915] Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffffa02d1855>] intel_lrc_irq_handler+0x35/0x240 [i915] Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffffa028f9e5>] gen8_gt_irq_handler+0x215/0x240 [i915] Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffffa028fa86>] gen8_irq_handler+0x76/0x650 [i915] Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff810da336>] __handle_irq_event_percpu+0x46/0x1c0 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff810da4d0>] handle_irq_event_percpu+0x20/0x50 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff810da53d>] handle_irq_event+0x3d/0x60 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff810dd6b8>] handle_edge_irq+0x88/0x130 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff81019e39>] handle_irq+0x19/0x30 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff81613408>] do_IRQ+0x48/0xd0 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8161124c>] common_interrupt+0x8c/0x8c Sep 01 08:04:42 linux-28d6.suse kernel: DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b Sep 01 08:04:42 linux-28d6.suse kernel: Sep 01 08:04:42 linux-28d6.suse kernel: Leftover inexact backtrace: Sep 01 08:04:42 linux-28d6.suse kernel: <IRQ> <EOI> [<ffffffff814d43e7>] ? cpuidle_enter_state+0xd7/0x270 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff814d43c2>] ? cpuidle_enter_state+0xb2/0x270 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff810c3d9d>] ? cpu_startup_entry+0x2ad/0x3a0 Sep 01 08:04:42 linux-28d6.suse kernel: [<ffffffff8104d6a0>] ? start_secondary+0x150/0x180 ``` but I don't think it's the same error. Could someone please give me a hint how I could debug, gather logs, save kdump, etc.? Installed vanilla kernel for now because I don't trust the SUSE patches anymore. So I will run: kernel-vanilla-4.4.79-18.26.2.x86_64 Please try the latest upstream kernel at first, to see whether the problem is present or not. 4.4.x vanilla has a lot of other problems (e.g. Skylake and newer sucks completely), so it won't help anything at all. What you could say is that it was old-good-day in the past. The whole DP-MST issues are still there, so you'll hit more or less another problem with 4.4.x vanilla kernel. Also, it'd be worth to try Leap 42.3 kernel with drm-kmp, which is based on 4.9.x kernel code. The unclaimed register warning might be a red herring. It was seen on Broadwell, but doesn't indicate about the crash, merely a sanity check, per se. The warning might be already addressed in the recent kernel, but it's unlikely related with the DP-MST problem. Thanks for your answer. Well, as reported in the clone source bug https://bugzilla.suse.com/show_bug.cgi?id=1029634 and also in the subject line the behaviour was flawless for my workflow with 4.4.36 so I suspect that either new changes from upstream since then caused the problem or SUSE patches. I suspect the latter but to crosscheck I am running the vanilla kernel which corresponds to the current Leap kernel-default to see if the error appears. I don't want to ignore the bug but help with debugging at least by running the closest kernel where the error might still be present. Doesn't this make sense? If you think it would help *you* more to find a fix I am happy to try a more recent version as well. Testing the latest upstream (i.e. 4.12.x and 4.13-rc) would be more helpful than sticking again with the rusty 4.4.x. The problem is that DP-MST is broken, and in your case, you had just a luck that casually worked somehow with older kernels. Now with more fixes on it, the hidden problem surfaced. I still could reproduce the original issue on Dell laptop with Skylake on 4.12 vanilla kernel. Since the hackish fix patch for SLE12 was removed on SLE15 code base, the issue happened on SLE15, too. Then I noticed that 4.13 seems working, and the bisection pointed to a few DP-MST patches. So these fix patches are merged to SLE15 (on the way, now in pull request). Though, these fixes don't look effective on 4.4.x, especially because only two of four commits are applicable. In anyway, I prepared a Leap 42.2 test kernel in OBS home:tiwai:bnc1055493 repo. No matter whether the test kernel works or not, please test the upstream 4.12.x and 4.13 kernels to see whether we're tracking the same problem. Your chip is different from mine, so it might be triggered by a different cause. (In reply to Takashi Iwai from comment #6) > No matter whether the test kernel works or not I was still running the vanilla kernel 4.4.79-18.26.2 until now when I could just reproduce a very similar crash when trying to switch the displays although it seems far less likely to hit that problem then with kernel-default. Also the graphics behave different: After a suspend even when not changing the monitor layout the external screen is more prone to stay dark until I force switching the monitor off and on again, be it over xrandr or when using the physical power button on the screen. Right now I am running 4.4.85-2.gba575f2-default from your test repo. It could be some time until I hit this problem again as it is hard to reproduce and also it's my work environment where I have to do some other work to do than just kernel graphics stack testing so please be patient :-) please test the upstream > 4.12.x and 4.13 kernels to see whether we're tracking the same problem. If/when my system crashes again with above's 4.4 test kernel I will check with the latest current kernel kernel-default-4.13.0-2.1.g7e9e30a.x86_64 from Kernel:HEAD (I assume this is the one you mean with upstream. For the past 17 days I have been running 4.13.1-1.gc0b7e1f-default from Kernel:HEAD and did not encounter the usual problems. By playing around heavily with switching back and forth between different screens activated, connected, disconnected, etc. I could induce a single crash which seemed to be related but not at all with the same probability of appearance as before. openSUSE-SU-2017:2739-1: An update that solves four vulnerabilities and has 15 fixes is now available. Category: security (important) Bug References: 1012382,1022967,1052593,1055493,1055755,1055896,1058038,1058410,1058507,1059051,1059465,1060197,1061017,1061046,1061064,1061067,1061172,1061831,1061872 CVE References: CVE-2017-1000252,CVE-2017-12153,CVE-2017-12154,CVE-2017-14489 Sources used: openSUSE Leap 42.2 (src): kernel-debug-4.4.90-18.32.1, kernel-default-4.4.90-18.32.1, kernel-docs-4.4.90-18.32.2, kernel-obs-build-4.4.90-18.32.1, kernel-obs-qa-4.4.90-18.32.1, kernel-source-4.4.90-18.32.1, kernel-syms-4.4.90-18.32.1, kernel-vanilla-4.4.90-18.32.1 openSUSE-SU-2017:2741-1: An update that solves four vulnerabilities and has 33 fixes is now available. Category: security (important) Bug References: 1005778,1005780,1005781,1012382,1022967,1036215,1036737,1037579,1037890,1043598,1044503,1047238,1051987,1052593,1053043,1055493,1055755,1056686,1057383,1057498,1058038,1058410,1058507,1058512,1058550,1059051,1059465,1059500,1060197,1060229,1061017,1061046,1061064,1061067,1061172,1061831,1061872 CVE References: CVE-2017-1000252,CVE-2017-12153,CVE-2017-12154,CVE-2017-14489 Sources used: openSUSE Leap 42.3 (src): kernel-debug-4.4.90-28.1, kernel-default-4.4.90-28.1, kernel-docs-4.4.90-28.2, kernel-obs-build-4.4.90-28.1, kernel-obs-qa-4.4.90-28.1, kernel-source-4.4.90-28.1, kernel-syms-4.4.90-28.1, kernel-vanilla-4.4.90-28.1 Hi Ralf (runger@suse.com), this is the bug I mentioned to you. To me it looks like a related problem. The workaround that works for me was to install the latest "kernel-default" from the OBS project Kernel:HEAD. To do that you can follow the following steps in a root terminal: ``` zypper ar https://download.opensuse.org/repositories/Kernel:/HEAD/standard/Kernel:HEAD.repo zypper in -r Kernel_HEAD kernel-default ``` SUSE-SU-2017:2847-1: An update that solves 11 vulnerabilities and has 170 fixes is now available. Category: security (important) Bug References: 1004527,1005776,1005778,1005780,1005781,1012382,1012829,1015342,1015343,1019675,1019680,1019695,1019699,1020412,1020645,1020657,1020989,1021424,1022595,1022604,1022743,1022912,1022967,1024346,1024373,1024405,1025461,1030850,1031717,1031784,1032150,1034048,1034075,1035479,1036060,1036215,1036737,1037579,1037838,1037890,1038583,1040813,1042847,1043598,1044503,1046529,1047238,1047487,1047989,1048155,1048228,1048325,1048327,1048356,1048501,1048893,1048912,1048934,1049226,1049272,1049291,1049336,1049361,1049580,1050471,1050742,1051790,1051987,1052093,1052094,1052095,1052360,1052384,1052580,1052593,1052888,1053043,1053309,1053472,1053627,1053629,1053633,1053681,1053685,1053802,1053915,1053919,1054082,1054084,1054654,1055013,1055096,1055272,1055290,1055359,1055493,1055567,1055709,1055755,1055896,1055935,1055963,1056061,1056185,1056230,1056261,1056427,1056587,1056588,1056596,1056686,1056827,1056849,1056982,1057015,1057031,1057035,1057038,1057047,1057067,1057383,1057498,1057849,1058038,1058116,1058135,1058410,1058507,1058512,1058550,1059051,1059465,1059500,1059863,1060197,1060229,1060249,1060400,1060985,1061017,1061046,1061064,1061067,1061172,1061451,1061721,1061775,1061831,1061872,1062279,1062520,1062962,1063102,1063349,1063460,1063475,1063479,1063501,1063509,1063520,1063570,1063667,1063671,1063695,1064064,1064206,1064388,1064436,963575,964944,966170,966172,966186,966191,966316,966318,969476,969477,969756,971975,981309 CVE References: CVE-2017-1000252,CVE-2017-11472,CVE-2017-12134,CVE-2017-12153,CVE-2017-12154,CVE-2017-13080,CVE-2017-14051,CVE-2017-14106,CVE-2017-14489,CVE-2017-15265,CVE-2017-15649 Sources used: SUSE Linux Enterprise Workstation Extension 12-SP3 (src): kernel-default-4.4.92-6.18.1 SUSE Linux Enterprise Software Development Kit 12-SP3 (src): kernel-docs-4.4.92-6.18.3, kernel-obs-build-4.4.92-6.18.1 SUSE Linux Enterprise Server 12-SP3 (src): kernel-default-4.4.92-6.18.1, kernel-source-4.4.92-6.18.1, kernel-syms-4.4.92-6.18.1 SUSE Linux Enterprise Live Patching 12-SP3 (src): kgraft-patch-SLE12-SP3_Update_4-1-4.3 SUSE Linux Enterprise High Availability 12-SP3 (src): kernel-default-4.4.92-6.18.1 SUSE Linux Enterprise Desktop 12-SP3 (src): kernel-default-4.4.92-6.18.1, kernel-source-4.4.92-6.18.1, kernel-syms-4.4.92-6.18.1 SUSE-SU-2017:2869-1: An update that solves 16 vulnerabilities and has 120 fixes is now available. Category: security (important) Bug References: 1006180,1011913,1012382,1012829,1013887,1019151,1020645,1020657,1021424,1022476,1022743,1022967,1023175,1024405,1028173,1028286,1029693,1030552,1030850,1031515,1031717,1031784,1033587,1034048,1034075,1034762,1036303,1036632,1037344,1037404,1037994,1038078,1038583,1038616,1038792,1039915,1040307,1040351,1041958,1042286,1042314,1042422,1042778,1043652,1044112,1044636,1045154,1045563,1045922,1046682,1046821,1046985,1047027,1047048,1047096,1047118,1047121,1047152,1047277,1047343,1047354,1047487,1047651,1047653,1047670,1048155,1048221,1048317,1048891,1048893,1048914,1048934,1049226,1049483,1049486,1049580,1049603,1049645,1049882,1050061,1050188,1051022,1051059,1051239,1051399,1051478,1051479,1051556,1051663,1051790,1052049,1052223,1052533,1052580,1052593,1052709,1052773,1052794,1052888,1053117,1053802,1053915,1053919,1054084,1055013,1055096,1055359,1055493,1055755,1055896,1056261,1056588,1056827,1056982,1057015,1058038,1058116,1058410,1058507,1059051,1059465,1060197,1061017,1061046,1061064,1061067,1061172,1061831,1061872,1063667,1064206,1064388,964063,971975,974215,981309 CVE References: CVE-2017-1000252,CVE-2017-10810,CVE-2017-11472,CVE-2017-11473,CVE-2017-12134,CVE-2017-12153,CVE-2017-12154,CVE-2017-13080,CVE-2017-14051,CVE-2017-14106,CVE-2017-14489,CVE-2017-15649,CVE-2017-7518,CVE-2017-7541,CVE-2017-7542,CVE-2017-8831 Sources used: SUSE Linux Enterprise Workstation Extension 12-SP2 (src): kernel-default-4.4.90-92.45.1 SUSE Linux Enterprise Software Development Kit 12-SP2 (src): kernel-docs-4.4.90-92.45.3, kernel-obs-build-4.4.90-92.45.1 SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src): kernel-default-4.4.90-92.45.1, kernel-source-4.4.90-92.45.1, kernel-syms-4.4.90-92.45.1 SUSE Linux Enterprise Server 12-SP2 (src): kernel-default-4.4.90-92.45.1, kernel-source-4.4.90-92.45.1, kernel-syms-4.4.90-92.45.1 SUSE Linux Enterprise Live Patching 12 (src): kgraft-patch-SLE12-SP2_Update_14-1-2.4 SUSE Linux Enterprise High Availability 12-SP2 (src): kernel-default-4.4.90-92.45.1 SUSE Linux Enterprise Desktop 12-SP2 (src): kernel-default-4.4.90-92.45.1, kernel-source-4.4.90-92.45.1, kernel-syms-4.4.90-92.45.1 SUSE Container as a Service Platform ALL (src): kernel-default-4.4.90-92.45.1 OpenStack Cloud Magnum Orchestration 7 (src): kernel-default-4.4.90-92.45.1 SUSE-SU-2017:3267-1: An update that solves 5 vulnerabilities and has 56 fixes is now available. Category: security (important) Bug References: 1012382,1017461,1020645,1022595,1022600,1022914,1022967,1025461,1028971,1030061,1034048,1037890,1052593,1053919,1055493,1055567,1055755,1055896,1056427,1058135,1058410,1058624,1059051,1059465,1059863,1060197,1060985,1061017,1061046,1061064,1061067,1061172,1061451,1061831,1061872,1062520,1062962,1063460,1063475,1063501,1063509,1063520,1063667,1063695,1064206,1064388,1064701,964944,966170,966172,966186,966191,966316,966318,969474,969475,969476,969477,971975,974590,996376 CVE References: CVE-2017-12153,CVE-2017-13080,CVE-2017-14489,CVE-2017-15265,CVE-2017-15649 Sources used: SUSE Linux Enterprise Real Time Extension 12-SP2 (src): kernel-rt-4.4.95-21.1, kernel-rt_debug-4.4.95-21.1, kernel-source-rt-4.4.95-21.1, kernel-syms-rt-4.4.95-21.1 Leap 42.2 is out of maintenance. If you see it with later products, please reopen with appropriate product changes. The problem persisted on openSUSE Leap 42.3 but was only initially reported against openSUSE Leap 42.2. IMHO it's a not a good idea to simply close still open bugs when they have been originally reported against an older distribution version that is not supported anymore per se. Since some weeks I am running 4.4.104-39-default or 4.14.0-1.gc6cd519-default and did not encounter this problem anymore so I assume that actually something fixed it in a kernel upgrade. SUSE-SU-2018:0509-1: An update that solves one vulnerability and has 8 fixes is now available. Category: security (moderate) Bug References: 1041744,1046821,1047277,1047729,1048155,1050256,1055493,1066175,1077885 CVE References: CVE-2017-10810 Sources used: SUSE Linux Enterprise Workstation Extension 12-SP3 (src): drm-4.9.33-4.11.1 SUSE Linux Enterprise Desktop 12-SP3 (src): drm-4.9.33-4.11.1 This is an autogenerated message for OBS integration: This bug (1055493) was mentioned in https://build.opensuse.org/request/show/589148 42.3 / drm openSUSE-RU-2018:0782-1: An update that has 6 recommended fixes can now be installed. Category: recommended (moderate) Bug References: 1041744,1047277,1047729,1055493,1066175,1077885 CVE References: Sources used: openSUSE Leap 42.3 (src): drm-4.9.33-10.2 |