Bug 443459

Summary: intel: Xserver is probably stuck in an infinite loop (lockup) (g33)
Product: [openSUSE] openSUSE 11.1 Reporter: Jiri Slaby <jslaby>
Component: X.OrgAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED UPSTREAM QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Normal    
Priority: P3 - Medium CC: sndirsch
Version: Factory   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
URL: https://bugs.freedesktop.org/show_bug.cgi?id=18663
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: xorg log
xorg log with the backtrace

Description Jiri Slaby 2008-11-10 17:03:38 UTC
After some time, the X server deadlocks. Happens to me with both gnome and KDE4.

Do you want me to provide a strace or attach gdb and show callstack when the server get in this state?

I don't know when this started but it lasts longer term, maybe 11.0 is affected too.
Comment 1 Jiri Slaby 2008-11-10 17:04:00 UTC
Created attachment 251096 [details]
xorg log
Comment 2 Jiri Slaby 2008-11-10 17:06:13 UTC
$ rpm -qa xorg*|sort
xorg-x11-devel-7.4-5.3
xorg-x11-driver-input-7.4-8.4
xorg-x11-driver-video-radeonhd-1.2.3_081106_bb14c00-1.1
xorg-x11-driver-video-unichrome-20080807-12.32
xorg-x11-driver-video-7.4-15.2
xorg-x11-fonts-core-7.4-1.22
xorg-x11-fonts-devel-7.4-1.20
xorg-x11-fonts-7.4-1.22
xorg-x11-libfontenc-devel-7.4-1.20
xorg-x11-libfontenc-32bit-7.4-1.16
xorg-x11-libfontenc-7.4-1.20
xorg-x11-libICE-devel-7.4-1.21
xorg-x11-libICE-32bit-7.4-1.17
xorg-x11-libICE-7.4-1.21
xorg-x11-libSM-devel-7.4-1.21
xorg-x11-libSM-32bit-7.4-1.19
xorg-x11-libSM-7.4-1.21
xorg-x11-libs-32bit-7.4-5.2
xorg-x11-libs-7.4-5.3
xorg-x11-libXau-devel-7.4-1.19
xorg-x11-libXau-32bit-7.4-1.16
xorg-x11-libXau-7.4-1.19
xorg-x11-libxcb-devel-7.4-1.17
xorg-x11-libxcb-32bit-7.4-1.15
xorg-x11-libxcb-7.4-1.17
xorg-x11-libXdmcp-devel-7.4-1.19
xorg-x11-libXdmcp-32bit-7.4-1.16
xorg-x11-libXdmcp-7.4-1.19
xorg-x11-libXext-devel-7.4-1.18
xorg-x11-libXext-32bit-7.4-1.16
xorg-x11-libXext-7.4-1.18
xorg-x11-libXfixes-devel-7.4-1.18
xorg-x11-libXfixes-32bit-7.4-1.16
xorg-x11-libXfixes-7.4-1.18
xorg-x11-libxkbfile-devel-7.4-1.18
xorg-x11-libxkbfile-32bit-7.4-1.16
xorg-x11-libxkbfile-7.4-1.18
xorg-x11-libXmu-devel-7.4-1.20
xorg-x11-libXmu-32bit-7.4-1.18
xorg-x11-libXmu-7.4-1.20
xorg-x11-libXp-devel-7.4-1.18
xorg-x11-libXpm-devel-7.4-1.20
xorg-x11-libXpm-32bit-7.4-1.18
xorg-x11-libXpm-7.4-1.20
xorg-x11-libXprintUtil-devel-7.4-1.20
xorg-x11-libXprintUtil-32bit-7.4-1.18
xorg-x11-libXprintUtil-7.4-1.20
xorg-x11-libXp-32bit-7.4-1.16
xorg-x11-libXp-7.4-1.18
xorg-x11-libXrender-devel-7.4-1.18
xorg-x11-libXrender-32bit-7.4-1.16
xorg-x11-libXrender-7.4-1.18
xorg-x11-libXt-devel-7.4-1.20
xorg-x11-libXt-32bit-7.4-1.18
xorg-x11-libXt-7.4-1.20
xorg-x11-libXv-devel-7.4-1.18
xorg-x11-libXv-32bit-7.4-1.16
xorg-x11-libXv-7.4-1.18
xorg-x11-libX11-devel-7.4-1.18
xorg-x11-libX11-32bit-7.4-1.16
xorg-x11-libX11-7.4-1.18
xorg-x11-proto-devel-7.4-1.24
xorg-x11-server-sdk-7.4-12.1
xorg-x11-server-7.4-12.1
xorg-x11-util-devel-7.4-1.19
xorg-x11-xauth-7.4-8.7
xorg-x11-xtrans-devel-7.4-4.6
xorg-x11-7.4-8.7
Comment 3 Stefan Dirsch 2008-11-10 17:08:11 UTC

*** This bug has been marked as a duplicate of bug 443409 ***
Comment 4 Stefan Dirsch 2008-11-11 02:08:28 UTC
Not a duplicate.
Comment 5 Stefan Dirsch 2008-11-11 02:11:07 UTC
> [mi] EQ overflowing. The server is probably stuck in an infinite loop.
> [mi] mieqEnequeue: out-of-order valuator event; dropping.

This reminds me to the following commit for xorg-server 1.5 branch (included in
xorg-server 1.5.3, but we're shipping 1.5.2):

commit 483fb847b4363d09ff3347f61ad51bba1dd00602
Author: Adam Jackson <ajax@redhat.com>
Date:   Fri Oct 10 16:33:24 2008 -0400

    mieq: Backtrace when the queue overflows.

    Since we're probably stuck down in a driver somewhere, let's at least
    try to point out where.  This will need to be rethought when the input
    thread work lands though.
    (cherry picked from commit b736f477f5324f79af30fc0f941ba0714a34ccda)

diff --git a/mi/mieq.c b/mi/mieq.c
index aaa247d..8037247 100644
--- a/mi/mieq.c
+++ b/mi/mieq.c
@@ -145,6 +145,7 @@ mieqEnqueue(DeviceIntPtr pDev, xEvent *e)
        oldtail = (oldtail - 1) % QUEUE_SIZE;
     }
     else {
+       static int stuck = 0;
        newtail = (oldtail + 1) % QUEUE_SIZE;
        /* Toss events which come in late.  Usually this means your server's
          * stuck in an infinite loop somewhere, but SIGIO is still getting
@@ -152,8 +153,13 @@ mieqEnqueue(DeviceIntPtr pDev, xEvent *e)
        if (newtail == miEventQueue.head) {
             ErrorF("[mi] EQ overflowing. The server is probably stuck "
                    "in an infinite loop.\n");
+           if (!stuck) {
+               xorg_backtrace();
+               stuck = 1;
+           }
            return;
         }
+       stuck = 0;
        miEventQueue.tail = newtail;
     }
Comment 6 Jiri Slaby 2008-11-11 12:26:13 UTC
I've built X with this patch included if you are interested:
http://labs.suse.cz/jslaby/bug-443459/
Comment 7 Stefan Dirsch 2008-11-11 14:05:32 UTC
Thanks. So what's the backtrace, when you run into this issue?
Comment 8 Jiri Slaby 2008-11-11 19:58:44 UTC
Created attachment 251409 [details]
xorg log with the backtrace

Happened after running glut application.

X + xterm + program is OK
X + xterm + gnome-session + program fails after ~ sec (resizing the window)

at least this scenario held twice.

This is a machine with ATI radeonhd driver compiled from git. (ATI mobile x1450)
Comment 9 Stefan Dirsch 2008-11-11 20:54:48 UTC
> (**) RADEONHD(0): Option "DRI" "on"

So you've 3D enabled. What's the output of 'hwinfo --gfxcard'?
Comment 10 Jiri Slaby 2008-11-11 21:01:27 UTC
Note that initially this bug was created with intel g33, but I think this is the same issue reproduced on another HW...

Here comes the hwinfo output:
23: PCI 100.0: 0300 VGA compatible controller (VGA)             
  [Created at pci.318]
  UDI: /org/freedesktop/Hal/devices/pci_1002_7186
  Unique ID: VCu0._3M0RbmOpn5
  Parent ID: vSkL.rxAOeWuq8i6
  SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0
  SysFS BusID: 0000:01:00.0
  Hardware Class: graphics card
  Model: "ATI Mobility Radeon X1450"
  Vendor: pci 0x1002 "ATI Technologies Inc"
  Device: pci 0x7186 "Mobility Radeon X1450"
  SubVendor: pci 0x1043 "ASUSTeK Computer Inc."
  SubDevice: pci 0x1231 
  Memory Range: 0xb0000000-0xb7ffffff (rw,prefetchable)
  I/O Ports: 0xa000-0xa0ff (rw)
  Memory Range: 0xfa8f0000-0xfa8fffff (rw,non-prefetchable)
  Memory Range: 0xfa8c0000-0xfa8dffff (ro,prefetchable,disabled)
  IRQ: 16 (54824 events)
  I/O Ports: 0x3c0-0x3df (rw)
  Module Alias: "pci:v00001002d00007186sv00001043sd00001231bc03sc00i00"
  Driver Info #0:
    XFree86 v4 Server Module: radeonhd
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #9 (PCI bridge)

Primary display adapter: #23
Comment 11 Stefan Dirsch 2008-11-11 21:05:35 UTC
> Vendor: pci 0x1002 "ATI Technologies Inc"
> Device: pci 0x7186 "Mobility Radeon X1450"

We do not enable by default and/or support 3D on this hardware. Can you reproduce this issue without Option DRI?
Comment 12 Jiri Slaby 2008-11-11 21:21:03 UTC
I repeat:
Note that initially this bug was created with intel g33, but I think this is
the same issue reproduced on another HW...

I'll replace xorg even on the intel machine with the debug package and get back with trace from there...
Comment 13 Stefan Dirsch 2008-11-11 21:29:07 UTC
So did you enable desktop effects/compiz and/or are using 3D screen savers?
Comment 14 Jiri Slaby 2008-11-11 21:39:06 UTC
compiz is not installed on both systems.

In KDE4 the default effects, so yes. In gnome, nothing, I hope, how can I find out? "Activate screensaver when computer is idle" is unticked in gnome-screensaver-preferences.
Comment 15 Stefan Dirsch 2008-11-11 21:42:17 UTC
Luc, please take over. Thanks.
Comment 16 Stefan Dirsch 2008-11-11 21:42:49 UTC
Luc, I have a G33 machine available for testing.
Comment 21 Stefan Dirsch 2008-11-14 17:17:11 UTC
Luc, why is this one set to NEEDINFO now? Which informations need to be provided by the reporter?
Comment 22 Luc Verhaegen 2008-11-14 17:22:10 UTC
Ah, i see, evdev should not be blamed apparently, you're right

Let's see whether the G33 reproduces this then.
Comment 23 Jiri Slaby 2008-11-14 18:06:09 UTC
Sorry guys, it's very hard to reproduce in the intel one. I have no results (backtraces) so far.
Comment 24 Luc Verhaegen 2008-11-14 19:43:14 UTC
Jiri, that's ok, let's see whether we can track it on our own driver first :)

Could it be that it is the evdev driver or the synaptics driver doing this? What happens if you use the mouse driver for a usb mouse, and disable the touchpad? Can you still trigger it then?

On the other hand, it seems like a drm issue as well, so i am trying to see now whether a normal desktop system shows the same issue on 11.1b5 x86-64

Comment 25 Jiri Slaby 2008-11-14 22:17:16 UTC
The intel card is in a desktop machine, so no synaptics...

Evdev is on the list of possible culprits, definitely.

I probably ran into another problem with drm, it locks up in the kernel in some radeon call, so I'm not sure if the trace from the machine with radeonhd attached here is relevant.

As I said I don't know how to reproduce it 100%, so I won't disable any drivers to get another trace from intel machine and if there will be evdev involved.
Comment 26 Luc Verhaegen 2008-11-15 01:52:45 UTC
I'm getting either the machine to lock, the driver to complain about the engine being locked up or i just get X waiting endlessly for an available command buffer when i switch engine contexts enough.

Possible reproductions:
* f-spot and glxgears on exa (has render composite).
* f-spot and torcs on exa (has render composite).
* _even_ glxgears and evtest on xaa (no render composite but same routine to switch the engine state).

None result in miEnqueue complaining though.
Comment 27 Jiri Slaby 2008-11-19 19:43:42 UTC
I've caught it with the intel hardware. evdev is there again. Here we go:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: X(mieqEnqueue+0x2c7) [0x4cc337]
1: X(xf86PostMotionEventP+0xc4) [0x4768e4]
2: X(xf86PostMotionEvent+0xa9) [0x476ab9]
3: /usr/lib64/xorg/modules//input/evdev_drv.so [0x7f819f56e415]
4: /usr/lib64/xorg/modules//input/evdev_drv.so [0x7f819f56e513]
5: /usr/lib64/xorg/modules//input/evdev_drv.so [0x7f819f56be6b]
6: X [0x46d4d7]
7: /lib64/libc.so.6 [0x7f81a0d0e7b0]
8: /lib64/libc.so.6(ioctl+0x7) [0x7f81a0daa537]
9: /usr/lib64/libdrm.so.2 [0x7f819fc15ca3]
10: /usr/lib64/libdrm.so.2(drmCommandWrite+0x1b) [0x7f819fc15d2b]
11: /usr/lib64/xorg/modules//drivers/intel_drv.so(I830Sync+0x118) [0x7f819f997238]
12: /usr/lib64/xorg/modules//libexa.so(exaWaitSync+0x5c) [0x7f819ed2465c]
13: /usr/lib64/xorg/modules//libexa.so(ExaDoPrepareAccess+0x91) [0x7f819ed257d1]
14: /usr/lib64/xorg/modules//libexa.so [0x7f819ed28e11]
15: /usr/lib64/xorg/modules//libexa.so [0x7f819ed28fbd]
16: /usr/lib64/xorg/modules//libexa.so [0x7f819ed293a3]
17: /usr/lib64/xorg/modules//libexa.so(exaDoMigration+0x69f) [0x7f819ed29bdf]
18: /usr/lib64/xorg/modules//libexa.so(exaCopyNtoN+0x37f) [0x7f819ed2839f]
19: /usr/lib64/xorg/modules//libexa.so(exaComposite+0x9e0) [0x7f819ed2bd80]
20: X [0x52cb78]
21: X [0x51c07a]
22: X(Dispatch+0x364) [0x44beb4]
23: X(main+0x45d) [0x43231d]
24: /lib64/libc.so.6(__libc_start_main+0xe6) [0x7f81a0cfa586]
25: X [0x4316f9]
Comment 28 Luc Verhaegen 2008-11-20 09:33:03 UTC
Juck.

Why do we get both evdev _and_ exa symbols in these backtraces... That's just messed up.

I'll get the r5xx engine to lock up again but this time with evdev running as well. Let's hope the mienqueue thing pops up then as well.
Comment 29 Luc Verhaegen 2008-11-20 14:19:24 UTC
Right... Well... You know... if you want the evdev driver to overflow the event queue... then it helps if you actually generate events... like... by moving the mouse a bit... *stares at floor ashamedly*

Anyway, there is full reproduction (when you've finally realised that you have to move the mouse a bit). And the conclusion then only becomes that this is 2 different issues.

1) the intel driver messing up.
2) the radeonhd driver messing up.

Both are of course busywaiting on something related to the drm. It's just that the symptom is pretty much the same that these are looking alike.

You don't fully need the patch jackson provided, you can also attach gdb to the process and then you should be able to see where it is stuck as well, but the patch helps (when it's there) as it requires no further intervention as the log contains everything one needs to know.

I'll bring up a separate bug on the radeonhd issue, attach your backtrace of that, explain briefly how i reproduce it and explain the broad reason for it crashing. Sadly this won't mean that the issue can be fixed easily.

In the intel case, it will also be severely non-trivial to fix, even though the hardware probably isn't as highly optimised and picky as radeonhd hw. But that's for upstream to deal with.
Comment 30 Stefan Dirsch 2008-11-20 14:38:08 UTC
Splitting up the bugreport makes perfectly sense to me. Jiri, could report the intel driver issue upstream on bugs.freedesktop.org (product: xorg, Component: Driver/intel) and add Luc+me to Cc (libv@skynet.be, sndirsch@suse.de). You'll need to register first. Thanks.
Comment 31 Luc Verhaegen 2008-11-20 15:04:43 UTC
radeonHD bug filed as: #447124
Comment 32 Stefan Dirsch 2008-11-20 15:57:33 UTC
Thanks. Setting to NEEDINFO as long as there is no upstream bugreport for the intel driver issue.
Comment 33 Jiri Slaby 2008-11-22 10:45:07 UTC
Created as:
https://bugs.freedesktop.org/show_bug.cgi?id=18663
Comment 34 Stefan Dirsch 2008-11-22 11:19:50 UTC
Thanks.