Bug 978785

Summary: KDE desktop crashes with qxl driver warning
Product: [openSUSE] openSUSE Distribution Reporter: Niblick <nixbugz>
Component: X.OrgAssignee: E-mail List <xorg-maintainer-bugs>
Status: RESOLVED WONTFIX QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Minor    
Priority: P4 - Low CC: nixbugz
Version: 13.2   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Xorg.1.log.old: (EE) qxl(0): error doing QXL_ALLOC
Xorg.0.log.old: (EE) Couldn't init device "ImExPS/2 Generic Explorer Mouse"
Xorg logs and systemd journal

Description Niblick 2016-05-06 07:39:33 UTC
User-Agent:       Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
Build Identifier: 

The KDE desktop crashes (about 3 times a week on average this year) with a qxl driver warning in dmesg:

[5003537.245362] ------------[ cut here ]------------
[5003537.248462] WARNING: CPU: 0 PID: 11815 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
[5003537.272330] sync obj 301 still has outstanding releases 0 0 0 262144 1
[5003537.272332] Modules linked in: fuse joydev uinput xt_pkttype xt_LOG rpcsec_gss_krb5 auth_rpcgss oid_registry xt_limit nfsv4 dns_resolver nfs lockd sunrpc fscache af_packet iscsi_ibft iscsi_boot_sysfs ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer crct10dif_pclmul snd crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel i2c_piix4 soundcore aes_x86_64 virtio_balloon lrw ppdev gf128mul parport_pc glue_helper ablk_helper parport cryptd button pcspkr serio_raw processor dm_mod
[5003537.272366]  sr_mod cdrom ata_generic virtio_net virtio_blk virtio_console ata_piix virtio_pci qxl virtio_ring virtio drm_kms_helper ttm drm floppy sg
[5003537.272376] CPU: 0 PID: 11815 Comm: Xorg Tainted: G        W     3.16.7-35-desktop #1
[5003537.272377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-20160216_104851-anatol 04/01/2014
[5003537.272379]  0000000000000009 ffffffff8161d46f ffff8801e69f7980 ffffffff8105cae7
[5003537.272382]  0000000000000001 ffff8801e69f79d0 0000000000000001 ffff8801006b4358
[5003537.272384]  000000000000012d ffffffff8105cb4c ffffffffa00c8010 ffffffff00000030
[5003537.272386] Call Trace:
[5003537.272397]  [<ffffffff810051ee>] dump_trace+0x8e/0x350
[5003537.272401]  [<ffffffff81005556>] show_stack_log_lvl+0xa6/0x190
[5003537.272404]  [<ffffffff81006aa1>] show_stack+0x21/0x50
[5003537.272408]  [<ffffffff8161d46f>] dump_stack+0x49/0x6a
[5003537.272423]  [<ffffffff8105cae7>] warn_slowpath_common+0x77/0x90
[5003537.272427]  [<ffffffff8105cb4c>] warn_slowpath_fmt+0x4c/0x50
[5003537.272433]  [<ffffffffa00bfee2>] qxl_sync_obj_wait+0x172/0x1f0 [qxl]
[5003537.272443]  [<ffffffffa009c747>] ttm_bo_wait+0x87/0x180 [ttm]
[5003537.272452]  [<ffffffffa009e12a>] ttm_bo_evict+0x4a/0x330 [ttm]
[5003537.272462]  [<ffffffffa009e58a>] ttm_mem_evict_first+0x17a/0x220 [ttm]
[5003537.272474]  [<ffffffffa009e8a1>] ttm_bo_mem_space+0x271/0x320 [ttm]
[5003537.272485]  [<ffffffffa009ed95>] ttm_bo_validate+0x1a5/0x210 [ttm]
[5003537.272496]  [<ffffffffa009f093>] ttm_bo_init+0x293/0x460 [ttm]
[5003537.272506]  [<ffffffffa00c1d7b>] qxl_bo_create+0x13b/0x1b0 [qxl]
[5003537.272518]  [<ffffffffa00c2487>] qxl_gem_object_create+0x57/0x100 [qxl]
[5003537.272531]  [<ffffffffa00c257a>] qxl_gem_object_create_with_handle+0x4a/0x100 [qxl]
[5003537.272544]  [<ffffffffa00c5485>] qxl_alloc_ioctl+0x35/0xa0 [qxl]
[5003537.272565]  [<ffffffffa002e8c7>] drm_ioctl+0x1c7/0x5b0 [drm]
[5003537.272573]  [<ffffffff811caff7>] do_vfs_ioctl+0x2e7/0x4c0
[5003537.272582]  [<ffffffff811cb251>] SyS_ioctl+0x81/0xa0
[5003537.272586]  [<ffffffff8162414d>] system_call_fastpath+0x1a/0x1f
[5003537.272590]  [<00007f83bf366ba7>] 0x7f83bf366ba6
[5003537.272592] ---[ end trace ce8a59f075a5213d ]---

This appears to be the same as the Closed bug https://bugzilla.opensuse.org/show_bug.cgi?id=905305

I don't believe it's restricted to KDE but it's a while since I've tried other desktops.

Reproducible: Sometimes

Steps to Reproduce:
Dragging windows about or playing videos makes it more likely to happen.



Red Hat have released a possible fix for this problem, among others, in their "Xorg server and driver bug fix and enhancement update"
https://rhn.redhat.com/errata/RHEA-2015-2198.html

In particular, Red Hat 1185807 - cannot show login page again of KVM guest (spice+qxl) after log out from guest's desktop
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=1185807

and possibly: 
Due to a memory leak in the QXL driver, an X.Org guest could terminate
unexpectedly. This update fixes the memory leak, and X.Org no longer crashes.
(BZ#1222038)

I am not authorized to access bug #1222038.

Would it be possible to port these fixes to openSUSE?
Comment 1 Egbert Eich 2016-05-06 08:24:23 UTC
Just a warning. Minor
Comment 2 Egbert Eich 2016-05-06 09:03:05 UTC
(In reply to NIk Zbugz from comment #0)
> 
> Would it be possible to port these fixes to openSUSE?

I don't see any patches mentioned in there. Just that some updated fixed issues with QXL.

It is not at all clear what the cause of the KDE crash is and whether the warning is related to it.
Have you tried a later version of openSUSE - ie. Leap 42.1?
Comment 3 Egbert Eich 2016-05-06 18:05:16 UTC
Also, please supply the X.org log file - /var/log/Xorg.0.log(.old). '.old' when the Xserver has restarted already after a crash occurred.


I've backported a bunch of crash fixes to 13.2 now.
You need to add:
http://download.opensuse.org/repositories/home:/eeich:/branches:/OBS_Maintained:/xf86-video-qxl/openSUSE_13.2_Update/
As repo and update the xf86-video-qxl.
Comment 4 Niblick 2016-05-07 07:01:42 UTC
(In reply to Egbert Eich from comment #1)
> Just a warning. Minor

It may be minor to you but I lose work when it happens.
Comment 5 Niblick 2016-05-07 07:19:21 UTC
(In reply to Egbert Eich from comment #2)
> (In reply to NIk Zbugz from comment #0)
> > 
> > Would it be possible to port these fixes to openSUSE?
> 
> I don't see any patches mentioned in there. Just that some updated fixed
> issues with QXL.
> 

I don't know how code from Red Hat is or might be included in openSUSE or if the Red Hat update is all or nothing.


> It is not at all clear what the cause of the KDE crash is and whether the
> warning is related to it.

How can I narrow down the cause or causes? 


> Have you tried a later version of openSUSE - ie. Leap 42.1?

No, I could set one up but don't have the time for the hours of use necessary to provoke it into falling over.
Comment 6 Niblick 2016-05-07 07:39:19 UTC
Created attachment 676026 [details]
Xorg.1.log.old: (EE) qxl(0): error doing QXL_ALLOC

The crashes happen mostly when I have 2 desktops open via Switch User and mostly on the second desktop, as here.
Comment 7 Niblick 2016-05-07 07:42:43 UTC
Created attachment 676027 [details]
Xorg.0.log.old: (EE) Couldn't init device "ImExPS/2 Generic Explorer Mouse"
Comment 8 Niblick 2016-05-07 07:57:57 UTC
(In reply to Egbert Eich from comment #3)
> Also, please supply the X.org log file - /var/log/Xorg.0.log(.old). '.old'
> when the Xserver has restarted already after a crash occurred.

See the 2 attachments. The QXL_ALLOC error is more common but I've included the other because it was there.  I could look through old log files to see if there are other errors.

> 
> I've backported a bunch of crash fixes to 13.2 now.
> You need to add:
> http://download.opensuse.org/repositories/home:/eeich:/branches:/
> OBS_Maintained:/xf86-video-qxl/openSUSE_13.2_Update/
> As repo and update the xf86-video-qxl.

Thank you, I'll try that.
Comment 9 Egbert Eich 2016-05-07 08:06:12 UTC
(In reply to NIk Zbugz from comment #5)
> No, I could set one up but don't have the time for the hours of use
> necessary to provoke it into falling over.

To me this is not really a strong argument. There is no such thing like a free lunch. I do this in my spare time as well.
If you think since I've got a SUSE email address, this is my work and obligation then I'd have a 12 to 14 h work day ;)

Regardless,

(In reply to NIk Zbugz from comment #6)
> Created attachment 676026 [details]
> Xorg.1.log.old: (EE) qxl(0): error doing QXL_ALLOC
> 

This looks like a memory leak issue. Several of those have been fixed in the patches I've backported now.

(In reply to NIk Zbugz from comment #8)

> > I've backported a bunch of crash fixes to 13.2 now.
> > You need to add:
> > http://download.opensuse.org/repositories/home:/eeich:/branches:/
> > OBS_Maintained:/xf86-video-qxl/openSUSE_13.2_Update/
> > As repo and update the xf86-video-qxl.
> 
> Thank you, I'll try that.

Ok, thanks!
Comment 10 Egbert Eich 2016-05-11 07:22:49 UTC
Nik, any news here?
Comment 11 Niblick 2016-05-17 03:45:05 UTC
Yes Egbert there is news but not good news.  I installed xf86-video-qxl 0.1.2-2.5.1 on openSUSE 13.2 but it still falls over:

opensuse:~ # journalctl -b|grep qxl_ttm.c
May 14 11:01:10 opensuse kernel: WARNING: CPU: 3 PID: 3474 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 15 07:13:59 opensuse kernel: WARNING: CPU: 0 PID: 20212 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 17 04:04:18 opensuse kernel: WARNING: CPU: 3 PID: 31395 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 17 04:04:41 opensuse kernel: WARNING: CPU: 1 PID: 27309 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 17 04:04:51 opensuse kernel: WARNING: CPU: 2 PID: 27524 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 17 04:12:29 opensuse kernel: WARNING: CPU: 2 PID: 27847 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 17 04:13:39 opensuse kernel: WARNING: CPU: 1 PID: 28431 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
May 17 04:15:16 opensuse kernel: WARNING: CPU: 2 PID: 28770 at ../drivers/gpu/drm/qxl/qxl_ttm.c:414 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
opensuse:~ #

I will attach the Xorg logs and systemd journal.  The last few entries are where it doesn't quite get to the desktop after logging in before reverting to the login screen.
Comment 12 Niblick 2016-05-17 03:53:57 UTC
Created attachment 677142 [details]
Xorg logs and systemd journal
Comment 13 Egbert Eich 2016-05-17 04:48:33 UTC
Ok, then I'm sorry - I can't help you.
Investigating this will simply cost too much time which I'm not able to invest.
Sorry again.
Comment 14 Bernhard Wiedemann 2016-05-17 07:00:11 UTC
This is an autogenerated message for OBS integration:
This bug (978785) was mentioned in
https://build.opensuse.org/request/show/396131 13.2 / xf86-input-vmmouse
Comment 15 Egbert Eich 2016-05-17 07:35:48 UTC
The situation is so complicated as the logs show several failures:
1. Crash in vmmouse driver
2. Vmmouse driver fails to reinitialize the device.
3. Cannot get DRM master upon VT switch.
4. Not able to allocate framebuffer.

1. and 4.seem to be a consequence of a memory leak in the kernel. I've fixed 1. now, but this doesn't solve the memory leak so it will fail with 4.
2. and 3. could be races due to switches between different user sessions. Fixing them may require a lengthy investigation. 3. could even be caused by Plymouth.

So even without the memory leak, problems would still remain.
Comment 16 Niblick 2016-05-20 08:21:38 UTC
Well that's disappointing but thanks for the work you have done on this.  I'm not sure about no. 1: are you saying it's fixed in vmmouse but it will still fail elsewhere until 4 is fixed (and if it is fixed, where can I find the new code)?

How do I get the other bugs fixed?  Should I add 2, 3 and 4 as new bugs and if so, here or somewhere else?

I've disabled Plymouth so I'll see how that goes. I expect no. 4 will be present in any distribution but I could try another desktop to try to avoid 2 & 3. Is there an alternative to vmmouse?

I'm surprised to be the only one apparently suffering these crashes: does no-one else use more than one login session with spice or is it something else?
Comment 17 Swamp Workflow Management 2016-05-25 15:11:08 UTC
openSUSE-RU-2016:1394-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 978785
CVE References: 
Sources used:
openSUSE 13.2 (src):    xf86-input-vmmouse-13.0.0-11.3.1