Bug 962866

Summary: BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 in intel_fb_obj_invalidate+0x1c/0xf0 [i915]
Product: [openSUSE] openSUSE Tumbleweed Reporter: Forgotten User l03xIL5qZl <forgotten_l03xIL5qZl>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: forgotten_l03xIL5qZl, forgotten_NXEif20qEv, jslaby, tiwai
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: First (truncated) output of computer lock
Second (truncated) output of computer lock

Description Forgotten User l03xIL5qZl 2016-01-20 20:24:18 UTC
With an updated Tumbleweed (as of yesterday), a new bug appears recently.

The main symptom is that after the screen locked itself on idling, you can't wake it up. You can't switch VT, nothing seems to have an effect.

If you log in remotely on it, then you can see that some processes are stalled (D state in htop), and all of them are trying to read /proc/consoles, like /sbin/showconsole (launched by NetworkManager).

Using strace on /sbin/showconsole show that it opens /proc/consoles, try to read it and… nothing. You can't stop it, even with SIGKILL.

Trying to suspend/resume failed: systemd also became stalled, and when the PID1 is blocked, you're screwed.

The only thing that works are MagicSysReq to reboot the machine.

As you can understand, this is really annoying.
Comment 1 Takashi Iwai 2016-01-22 15:28:48 UTC
Hmm, there is no change in fs/proc/consoles.c itself, so if any, it must be in the layer below that, e.g. console_lock() deadlocks, etc.

Could you try to get the whole tasks with alt-sysrq-t?
Comment 2 Forgotten User l03xIL5qZl 2016-01-22 15:34:04 UTC
Even if I had the issue four times in 3 days, I've never encountered the same symptoms since the bug entry creation.

However, if I face this issue one more time, I doubt I'll be able to dump that information: screen is blank. And as far as I can remember, MagicSysReq only print information on console, not on remote SSH sessions.
Comment 3 Takashi Iwai 2016-01-22 15:39:54 UTC
It'll be logged in journal, if it's alive.
Comment 4 Forgotten User l03xIL5qZl 2016-01-28 15:58:29 UTC
Created attachment 663626 [details]
First (truncated) output of computer lock

First time I had a lock down after this bug report.

Unfortunately, the ring buffer is not large enough, and this was not visible using journalctl -k

Does journald redirect kernel ring buffer?
Comment 5 Forgotten User l03xIL5qZl 2016-01-28 15:59:08 UTC
Created attachment 663627 [details]
Second (truncated) output of computer lock
Comment 6 Jiri Slaby 2016-02-01 12:15:50 UTC
Cool. Could you install and probe a kernel with lockdep enabled:
https://build.opensuse.org/project/monitor/home:jirislaby:stable-lockdep
?
Comment 7 Jiri Slaby 2016-02-01 15:10:33 UTC
Nevermind, the true reason is this:
WARNING: CPU: 0 PID: 1151 at ../include/linux/kref.h:46 drm_framebuffer_reference+0x64/0x70 [drm]()
Modules linked in: ... [last unloaded: vboxdrv]
CPU: 0 PID: 1151 Comm: X Tainted: G        W  O    4.4.0-1-default #1
Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.40 01/31/2013
 ffffffffa01ee7e1 ffff88032ce7baa8 ffffffff8137f639 0000000000000000
 ffff88032ce7bae0 ffffffff8107d132 ffff880036d3af40 ffff8800a19781c0
 ffff8800a19781c0 ffff88030aee4400 ffff88032ef10000 ffff88032ce7baf0
Call Trace:
 [<ffffffff8101a095>] try_stack_unwind+0x175/0x190
 [<ffffffff81018fe9>] dump_trace+0x69/0x3a0
 [<ffffffff8101a0fb>] show_trace_log_lvl+0x4b/0x60
 [<ffffffff8101942c>] show_stack_log_lvl+0x10c/0x180
 [<ffffffff8101a195>] show_stack+0x25/0x50
 [<ffffffff8137f639>] dump_stack+0x4b/0x72
 [<ffffffff8107d132>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107d22a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa01c8294>] drm_framebuffer_reference+0x64/0x70 [drm]
 [<ffffffffa01da1bd>] drm_atomic_set_fb_for_plane+0x2d/0x90 [drm]
 [<ffffffffa026db6c>] __drm_atomic_helper_set_config+0xbc/0x3a0 [drm_kms_helper]
 [<ffffffffa026fa9c>] drm_fb_helper_pan_display+0x18c/0x230 [drm_kms_helper]
 [<ffffffffa037eb0a>] intel_fbdev_pan_display+0x1a/0x60 [i915]
 [<ffffffff813f6a6f>] fb_pan_display+0xcf/0x160
 [<ffffffff813f10b0>] bit_update_start+0x20/0x50
 [<ffffffff813ee233>] fbcon_switch+0x3b3/0x600
 [<ffffffff81481668>] redraw_screen+0x178/0x260
 [<ffffffff81477f3f>] complete_change_console+0x3f/0xe0
 [<ffffffff814786ce>] vt_ioctl+0x6ee/0x12c0
 [<ffffffff8146bd61>] tty_ioctl+0x361/0xc30
 [<ffffffff8120e328>] do_vfs_ioctl+0x288/0x470
 [<ffffffff8120e589>] SyS_ioctl+0x79/0x90

BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
IP: [<ffffffffa0375cfc>] intel_fb_obj_invalidate+0x1c/0xf0 [i915]
PGD 32cb59067 PUD 32131c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 1151 Comm: X Tainted: G        W  O    4.4.0-1-default #1
Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.40 01/31/2013
task: ffff88032c71e240 ti: ffff88032ce78000 task.ti: ffff88032ce78000
RIP: 0010:[<ffffffffa0375cfc>]  [<ffffffffa0375cfc>] intel_fb_obj_invalidate+0x1c/0xf0 [i915]
RSP: 0018:ffff88032ce7bac8  EFLAGS: 00010246
RAX: ffff88032c71e240 RBX: ffff8802fb63be00 RCX: ffff88012ae9dfc0
RDX: ffff880036d3af40 RSI: 0000000000000000 RDI: ffff8802fb63be00
RBP: ffff88032ce7baf0 R08: ffffea000c3eb7df R09: 0000000000000008
R10: ffff8800aae9db80 R11: ffff8800aae9d780 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000200001 R15: 0000000000000080
FS:  00007f1dfefbca00(0000) GS:ffff88033ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 000000032d524000 CR4: 00000000001406f0
Stack:
 ffff88032f62e600 ffff88032fb35800 0000000000000000 0000000000200001
 0000000000000080 ffff88032ce7bb10 ffffffffa037ebf3 0000000000000000
 ffff88032ce7bc68 ffff88032ce7bc48 ffffffff813f6ed8 ffff88032fb35860
Call Trace:
 [<ffffffffa037ebf3>] intel_fbdev_set_par+0x43/0x60 [i915]
 [<ffffffff813f6ed8>] fb_set_var+0x238/0x460
 [<ffffffff813ed749>] fbcon_blank+0x2e9/0x330
 [<ffffffff81482ac3>] do_unblank_screen+0xc3/0x190
 [<ffffffff81477f59>] complete_change_console+0x59/0xe0
 [<ffffffff814786ce>] vt_ioctl+0x6ee/0x12c0
 [<ffffffff8146bd61>] tty_ioctl+0x361/0xc30
 [<ffffffff8120e328>] do_vfs_ioctl+0x288/0x470
 [<ffffffff8120e589>] SyS_ioctl+0x79/0x90
Code: 41 5f 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 89 f5 53 4c 8b 67 08 48 89 fb <41> 8b 44 24 60 4d 8b 74 24 28 83 f8 01 74 58 8b b3 5c 01 00 00
RIP  [<ffffffffa0375cfc>] intel_fb_obj_invalidate+0x1c/0xf0 [i915]
 RSP <ffff88032ce7bac8>
CR2: 0000000000000060
Comment 8 Forgotten User l03xIL5qZl 2016-02-01 15:15:53 UTC
I saw that in the log, but I was hoping that VirtualBox was not involved. Hmmm, I think I will then try to avoid loading it at startup, and only run it when required…
Comment 9 Jiri Slaby 2016-02-01 15:19:57 UTC
Which should be this:
https://bugs.freedesktop.org/show_bug.cgi?id=93822
Comment 10 Jiri Slaby 2016-02-01 15:20:24 UTC
(In reply to Adrien Clerc from comment #8)
> I saw that in the log, but I was hoping that VirtualBox was not involved.
> Hmmm, I think I will then try to avoid loading it at startup, and only run
> it when required…

It's likely has nothing to do with vbox.
Comment 11 Jiri Slaby 2016-02-01 15:46:37 UTC
I pushed:
0c82312f3f15538f4e6ceda2a82caee8fbac4501
51f1385b90c1ad30896bd62b1ff97aa4edb1a163
ca40ba855c9e3f19f2715fd8a1ced5128359d3d9
to the stable branch.

But it remains to decide which kernel versions need that.
Comment 12 Takashi Iwai 2016-02-01 15:57:47 UTC
(In reply to Jiri Slaby from comment #11)
> I pushed:
> 0c82312f3f15538f4e6ceda2a82caee8fbac4501
> 51f1385b90c1ad30896bd62b1ff97aa4edb1a163
> ca40ba855c9e3f19f2715fd8a1ced5128359d3d9
> to the stable branch.

Thanks!
 
> But it remains to decide which kernel versions need that.

I remember of this bug on my machine, and this appears to be a symptom is seen since 4.4 kernel.  Maybe the problem has been already there but the new code path triggers it.

Let's see whether we have a similar report on Leap 4.1.x kernel.
Comment 13 Forgotten User l03xIL5qZl 2016-02-02 08:14:28 UTC
Many thanks for the quick identification.

So now, I'll wait eagerly for the fix to be pushed :)
Comment 15 Forgotten User l03xIL5qZl 2016-03-23 16:47:41 UTC
Linux kernel 4.5 hits Tumbleweed some days ago. It seems my problem is gone now. I'll close this bug in a few days if no new comment appears.
Comment 16 Forgotten User l03xIL5qZl 2016-04-12 07:37:09 UTC
Few days have passed. Closing as resolved.
Comment 17 Swamp Workflow Management 2016-04-12 10:14:18 UTC
openSUSE-SU-2016:1008-1: An update that solves 15 vulnerabilities and has 26 fixes is now available.

Category: security (important)
Bug References: 814440,884701,949936,951440,951542,951626,951638,953527,954018,954404,954405,954876,958439,958463,958504,959709,960561,960563,960710,961263,961500,961509,962257,962866,962977,963746,963765,963767,963931,965125,966137,966179,966259,966437,966684,966693,968018,969356,969582,970845,971125
CVE References: CVE-2015-1339,CVE-2015-7799,CVE-2015-7872,CVE-2015-7884,CVE-2015-8104,CVE-2015-8709,CVE-2015-8767,CVE-2015-8785,CVE-2015-8787,CVE-2015-8812,CVE-2016-0723,CVE-2016-2069,CVE-2016-2184,CVE-2016-2383,CVE-2016-2384
Sources used:
openSUSE Leap 42.1 (src):    kernel-debug-4.1.20-11.1, kernel-default-4.1.20-11.1, kernel-docs-4.1.20-11.3, kernel-ec2-4.1.20-11.1, kernel-obs-build-4.1.20-11.2, kernel-obs-qa-4.1.20-11.1, kernel-obs-qa-xen-4.1.20-11.1, kernel-pae-4.1.20-11.1, kernel-pv-4.1.20-11.1, kernel-source-4.1.20-11.1, kernel-syms-4.1.20-11.1, kernel-vanilla-4.1.20-11.1, kernel-xen-4.1.20-11.1
Comment 18 Forgotten User NXEif20qEv 2016-06-08 22:42:31 UTC
On the latest Leap(4.1.21-14-xen) running Xen kernel we get deadlocks when switching console.
Locked processes (and the whole GUI):
/sbin/agetty
/sbin/showconsole

Kernel BUG dump:
Jun 08 18:20:10 linux-6956 kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Jun 08 18:20:10 linux-6956 kernel: IP: [<          (null)>]           (null)
Jun 08 18:20:10 linux-6956 kernel: PGD f4160067 PUD f4195067 PMD 0 
Jun 08 18:20:10 linux-6956 kernel: Oops: 0010 [#1] SMP 
Jun 08 18:20:10 linux-6956 kernel: Modules linked in: bnep bluetooth rfkill fuse ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bridge stp llc iscsi_ibft iscsi_boot_sysfs ipmi_ssif xfs libcrc32c blktap blktap2 pciback 8250_fintek coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt dm_mod iTCO_vendor_support usbbk battery aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper pcspkr lpc_ich mfd_core cryptd i2c_i801 joydev ipmi_si 8250 ipmi_msghandler serial_core xen_scsibk tpm_crb tpm_tis video tpm acpi_pad fan thermal igb ptp pps_core ie31200_edac mei_me button mei edac_core shpchp processor thermal_sys hwmon blkbk blkback_pagemap domctl netbk xenbus_be gntdev evtchn btrfs xor hid_generic usbhid raid6_pq raid1 md_mod ast syscopyarea sysfillrect crc32c_intel sysimgblt i2c_algo_bit
Jun 08 18:20:10 linux-6956 kernel:  drm_kms_helper ttm xhci_pci ehci_pci ehci_hcd xhci_hcd drm i2c_core usbcore usb_common sg
Jun 08 18:20:10 linux-6956 kernel: CPU: 2 PID: 1693 Comm: Xorg Not tainted 4.1.21-14-xen #1
Jun 08 18:20:10 linux-6956 kernel: Hardware name: Silicon Mechanics Rackform iServ R135.v5.1/X10SLM+-LN4F, BIOS 3.0 04/24/2015
Jun 08 18:20:10 linux-6956 kernel: task: ffff8801d6d6e650 ti: ffff8801d39b8000 task.ti: ffff8801d39b8000
Jun 08 18:20:10 linux-6956 kernel: RIP: e030:[<0000000000000000>]  [<          (null)>]           (null)
Jun 08 18:20:10 linux-6956 kernel: RSP: e02b:ffff8801d39bb640  EFLAGS: 00010206
Jun 08 18:20:10 linux-6956 kernel: RAX: 4000000001000000 RBX: ffff8801e9593940 RCX: 0000000000000002
Jun 08 18:20:10 linux-6956 kernel: RDX: 4000000001000000 RSI: 0000000000000000 RDI: ffff8801e9593940
Jun 08 18:20:10 linux-6956 kernel: RBP: ffff8801e58d1000 R08: 00000000000f6fff R09: 000000000000003c
Jun 08 18:20:10 linux-6956 kernel: R10: 0000000000007ff0 R11: ffffffffa008e1b9 R12: ffff8801e9593940
Jun 08 18:20:10 linux-6956 kernel: R13: 0000000000000000 R14: 00000000000007e9 R15: 0000000000000000
Jun 08 18:20:10 linux-6956 kernel: FS:  00007f05657148c0(0000) GS:ffff8801e4680000(0000) knlGS:ffff8801e4680000
Jun 08 18:20:10 linux-6956 kernel: CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 08 18:20:10 linux-6956 kernel: CR2: 0000000000000000 CR3: 00000000fb230000 CR4: 0000000000042660
Jun 08 18:20:10 linux-6956 kernel: Stack:
Jun 08 18:20:10 linux-6956 kernel:  ffffffff8013c1ed 0000000000000001 ffffffff8004eeb9 00000000f6005000
Jun 08 18:20:10 linux-6956 kernel:  ffffc90002800000 00000000f67ee000 ffff8801e9593940 ffff8801e58d1000
Jun 08 18:20:10 linux-6956 kernel:  0000000000000000 4000000001000000 00000000000007e9 0000000000000000
Jun 08 18:20:10 linux-6956 kernel: Call Trace:
Jun 08 18:20:10 linux-6956 kernel: Inexact backtrace:
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff8013c1ed>] ? free_pages_prepare+0x1dd/0x2d0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff8004eeb9>] ? iomem_map_sanity_check+0x89/0xd0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff8013d176>] ? free_hot_cold_page+0x26/0x1a0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa0091bd2>] ? ttm_put_pages+0x152/0x1c0 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa0089384>] ? ttm_mem_global_free_zone+0x24/0x80 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa0091c99>] ? ttm_pool_unpopulate+0x59/0x70 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa008a1ed>] ? ttm_tt_destroy+0x5d/0x70 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa008e8c1>] ? ttm_bo_move_memcpy+0x361/0x630 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa008a265>] ? ttm_tt_init+0x65/0xb0 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff801718ad>] ? free_vmap_area_noflush+0x2d/0x60
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa008bf86>] ? ttm_bo_handle_move_mem+0x256/0x5b0 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa008ca11>] ? ttm_bo_mem_space+0x181/0x350 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa008d0c8>] ? ttm_bo_validate+0x1e8/0x200 [ttm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff800aa3a3>] ? hrtimer_try_to_cancel+0x43/0xf0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa020e4ca>] ? ast_bo_pin+0x7a/0xa0 [ast]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa020c077>] ? ast_crtc_do_set_base.isra.14.constprop.24+0xe7/0x390 [ast]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00eea17>] ? _object_find+0x67/0xb0 [drm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00b1896>] ? drm_crtc_helper_set_config+0x7d6/0xae0 [drm_kms_helper]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff8001377e>] ? __switch_to+0x22e/0x940
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00f06d8>] ? drm_mode_set_config_internal+0x68/0x100 [drm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00bc280>] ? restore_fbdev_mode+0xc0/0xe0 [drm_kms_helper]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00be130>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x20/0x60 [drm_kms_helper]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00be192>] ? drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff80386d2e>] ? fb_set_var+0x15e/0x3b0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff8037e10b>] ? fbcon_blank+0x1cb/0x2b0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff803fac05>] ? do_unblank_screen+0xa5/0x1c0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff803f0e03>] ? complete_change_console+0x53/0xe0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff803f1ddc>] ? vt_ioctl+0xf4c/0x10f0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffffa00e59ba>] ? drm_ioctl+0x17a/0x590 [drm]
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff803e4647>] ? tty_ioctl+0x207/0xd30
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff801644b5>] ? handle_mm_fault+0xdc5/0x1920
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff801a41df>] ? do_vfs_ioctl+0x2ff/0x520
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff80054327>] ? recalc_sigpending+0x17/0x50
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff80054cfd>] ? __set_task_blocked+0x2d/0x70
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff800e10cc>] ? __audit_syscall_entry+0xac/0xf0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff80022bfb>] ? syscall_trace_enter_phase1+0xfb/0x160
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff801a4481>] ? SyS_ioctl+0x81/0xa0
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff805f205d>] ? system_call_fastpath+0x16/0x76
Jun 08 18:20:10 linux-6956 kernel:  [<ffffffff805f2010>] ? __entry_text_start+0x8/0x8
Jun 08 18:20:10 linux-6956 kernel: Code:  Bad RIP value.
Jun 08 18:20:10 linux-6956 kernel: RIP  [<          (null)>]           (null)
Jun 08 18:20:10 linux-6956 kernel:  RSP <ffff8801d39bb640>
Jun 08 18:20:10 linux-6956 kernel: CR2: 0000000000000000
Jun 08 18:20:10 linux-6956 kernel: ---[ end trace 80cd0c2700e7d36c ]---

------------
linux-6956:~ # xl info
host                   : linux-6956
release                : 4.1.21-14-xen
version                : #1 SMP Sun Apr 17 07:27:45 UTC 2016 (fc187c1)
machine                : x86_64
nr_cpus                : 8
max_cpu_id             : 7
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 2
cpu_mhz                : 3400

It does not seem to happen on non-xen kernel.

Does this warrant reopening the issue, or a new issue?
Comment 19 Takashi Iwai 2016-06-09 06:45:05 UTC
(In reply to Vanja Bucic from comment #18)
> It does not seem to happen on non-xen kernel.
> 
> Does this warrant reopening the issue, or a new issue?

A different hardware (ast), a different code path, a different symptom.
An absolutely different problem.  Please open another bug report with more hardware details.