Bugzilla – Bug 443459
intel: Xserver is probably stuck in an infinite loop (lockup) (g33)
Last modified: 2008-11-22 11:19:50 UTC
After some time, the X server deadlocks. Happens to me with both gnome and KDE4. Do you want me to provide a strace or attach gdb and show callstack when the server get in this state? I don't know when this started but it lasts longer term, maybe 11.0 is affected too.
Created attachment 251096 [details] xorg log
$ rpm -qa xorg*|sort xorg-x11-devel-7.4-5.3 xorg-x11-driver-input-7.4-8.4 xorg-x11-driver-video-radeonhd-1.2.3_081106_bb14c00-1.1 xorg-x11-driver-video-unichrome-20080807-12.32 xorg-x11-driver-video-7.4-15.2 xorg-x11-fonts-core-7.4-1.22 xorg-x11-fonts-devel-7.4-1.20 xorg-x11-fonts-7.4-1.22 xorg-x11-libfontenc-devel-7.4-1.20 xorg-x11-libfontenc-32bit-7.4-1.16 xorg-x11-libfontenc-7.4-1.20 xorg-x11-libICE-devel-7.4-1.21 xorg-x11-libICE-32bit-7.4-1.17 xorg-x11-libICE-7.4-1.21 xorg-x11-libSM-devel-7.4-1.21 xorg-x11-libSM-32bit-7.4-1.19 xorg-x11-libSM-7.4-1.21 xorg-x11-libs-32bit-7.4-5.2 xorg-x11-libs-7.4-5.3 xorg-x11-libXau-devel-7.4-1.19 xorg-x11-libXau-32bit-7.4-1.16 xorg-x11-libXau-7.4-1.19 xorg-x11-libxcb-devel-7.4-1.17 xorg-x11-libxcb-32bit-7.4-1.15 xorg-x11-libxcb-7.4-1.17 xorg-x11-libXdmcp-devel-7.4-1.19 xorg-x11-libXdmcp-32bit-7.4-1.16 xorg-x11-libXdmcp-7.4-1.19 xorg-x11-libXext-devel-7.4-1.18 xorg-x11-libXext-32bit-7.4-1.16 xorg-x11-libXext-7.4-1.18 xorg-x11-libXfixes-devel-7.4-1.18 xorg-x11-libXfixes-32bit-7.4-1.16 xorg-x11-libXfixes-7.4-1.18 xorg-x11-libxkbfile-devel-7.4-1.18 xorg-x11-libxkbfile-32bit-7.4-1.16 xorg-x11-libxkbfile-7.4-1.18 xorg-x11-libXmu-devel-7.4-1.20 xorg-x11-libXmu-32bit-7.4-1.18 xorg-x11-libXmu-7.4-1.20 xorg-x11-libXp-devel-7.4-1.18 xorg-x11-libXpm-devel-7.4-1.20 xorg-x11-libXpm-32bit-7.4-1.18 xorg-x11-libXpm-7.4-1.20 xorg-x11-libXprintUtil-devel-7.4-1.20 xorg-x11-libXprintUtil-32bit-7.4-1.18 xorg-x11-libXprintUtil-7.4-1.20 xorg-x11-libXp-32bit-7.4-1.16 xorg-x11-libXp-7.4-1.18 xorg-x11-libXrender-devel-7.4-1.18 xorg-x11-libXrender-32bit-7.4-1.16 xorg-x11-libXrender-7.4-1.18 xorg-x11-libXt-devel-7.4-1.20 xorg-x11-libXt-32bit-7.4-1.18 xorg-x11-libXt-7.4-1.20 xorg-x11-libXv-devel-7.4-1.18 xorg-x11-libXv-32bit-7.4-1.16 xorg-x11-libXv-7.4-1.18 xorg-x11-libX11-devel-7.4-1.18 xorg-x11-libX11-32bit-7.4-1.16 xorg-x11-libX11-7.4-1.18 xorg-x11-proto-devel-7.4-1.24 xorg-x11-server-sdk-7.4-12.1 xorg-x11-server-7.4-12.1 xorg-x11-util-devel-7.4-1.19 xorg-x11-xauth-7.4-8.7 xorg-x11-xtrans-devel-7.4-4.6 xorg-x11-7.4-8.7
*** This bug has been marked as a duplicate of bug 443409 ***
Not a duplicate.
> [mi] EQ overflowing. The server is probably stuck in an infinite loop. > [mi] mieqEnequeue: out-of-order valuator event; dropping. This reminds me to the following commit for xorg-server 1.5 branch (included in xorg-server 1.5.3, but we're shipping 1.5.2): commit 483fb847b4363d09ff3347f61ad51bba1dd00602 Author: Adam Jackson <ajax@redhat.com> Date: Fri Oct 10 16:33:24 2008 -0400 mieq: Backtrace when the queue overflows. Since we're probably stuck down in a driver somewhere, let's at least try to point out where. This will need to be rethought when the input thread work lands though. (cherry picked from commit b736f477f5324f79af30fc0f941ba0714a34ccda) diff --git a/mi/mieq.c b/mi/mieq.c index aaa247d..8037247 100644 --- a/mi/mieq.c +++ b/mi/mieq.c @@ -145,6 +145,7 @@ mieqEnqueue(DeviceIntPtr pDev, xEvent *e) oldtail = (oldtail - 1) % QUEUE_SIZE; } else { + static int stuck = 0; newtail = (oldtail + 1) % QUEUE_SIZE; /* Toss events which come in late. Usually this means your server's * stuck in an infinite loop somewhere, but SIGIO is still getting @@ -152,8 +153,13 @@ mieqEnqueue(DeviceIntPtr pDev, xEvent *e) if (newtail == miEventQueue.head) { ErrorF("[mi] EQ overflowing. The server is probably stuck " "in an infinite loop.\n"); + if (!stuck) { + xorg_backtrace(); + stuck = 1; + } return; } + stuck = 0; miEventQueue.tail = newtail; }
I've built X with this patch included if you are interested: http://labs.suse.cz/jslaby/bug-443459/
Thanks. So what's the backtrace, when you run into this issue?
Created attachment 251409 [details] xorg log with the backtrace Happened after running glut application. X + xterm + program is OK X + xterm + gnome-session + program fails after ~ sec (resizing the window) at least this scenario held twice. This is a machine with ATI radeonhd driver compiled from git. (ATI mobile x1450)
> (**) RADEONHD(0): Option "DRI" "on" So you've 3D enabled. What's the output of 'hwinfo --gfxcard'?
Note that initially this bug was created with intel g33, but I think this is the same issue reproduced on another HW... Here comes the hwinfo output: 23: PCI 100.0: 0300 VGA compatible controller (VGA) [Created at pci.318] UDI: /org/freedesktop/Hal/devices/pci_1002_7186 Unique ID: VCu0._3M0RbmOpn5 Parent ID: vSkL.rxAOeWuq8i6 SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0 SysFS BusID: 0000:01:00.0 Hardware Class: graphics card Model: "ATI Mobility Radeon X1450" Vendor: pci 0x1002 "ATI Technologies Inc" Device: pci 0x7186 "Mobility Radeon X1450" SubVendor: pci 0x1043 "ASUSTeK Computer Inc." SubDevice: pci 0x1231 Memory Range: 0xb0000000-0xb7ffffff (rw,prefetchable) I/O Ports: 0xa000-0xa0ff (rw) Memory Range: 0xfa8f0000-0xfa8fffff (rw,non-prefetchable) Memory Range: 0xfa8c0000-0xfa8dffff (ro,prefetchable,disabled) IRQ: 16 (54824 events) I/O Ports: 0x3c0-0x3df (rw) Module Alias: "pci:v00001002d00007186sv00001043sd00001231bc03sc00i00" Driver Info #0: XFree86 v4 Server Module: radeonhd Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #9 (PCI bridge) Primary display adapter: #23
> Vendor: pci 0x1002 "ATI Technologies Inc" > Device: pci 0x7186 "Mobility Radeon X1450" We do not enable by default and/or support 3D on this hardware. Can you reproduce this issue without Option DRI?
I repeat: Note that initially this bug was created with intel g33, but I think this is the same issue reproduced on another HW... I'll replace xorg even on the intel machine with the debug package and get back with trace from there...
So did you enable desktop effects/compiz and/or are using 3D screen savers?
compiz is not installed on both systems. In KDE4 the default effects, so yes. In gnome, nothing, I hope, how can I find out? "Activate screensaver when computer is idle" is unticked in gnome-screensaver-preferences.
Luc, please take over. Thanks.
Luc, I have a G33 machine available for testing.
Luc, why is this one set to NEEDINFO now? Which informations need to be provided by the reporter?
Ah, i see, evdev should not be blamed apparently, you're right Let's see whether the G33 reproduces this then.
Sorry guys, it's very hard to reproduce in the intel one. I have no results (backtraces) so far.
Jiri, that's ok, let's see whether we can track it on our own driver first :) Could it be that it is the evdev driver or the synaptics driver doing this? What happens if you use the mouse driver for a usb mouse, and disable the touchpad? Can you still trigger it then? On the other hand, it seems like a drm issue as well, so i am trying to see now whether a normal desktop system shows the same issue on 11.1b5 x86-64
The intel card is in a desktop machine, so no synaptics... Evdev is on the list of possible culprits, definitely. I probably ran into another problem with drm, it locks up in the kernel in some radeon call, so I'm not sure if the trace from the machine with radeonhd attached here is relevant. As I said I don't know how to reproduce it 100%, so I won't disable any drivers to get another trace from intel machine and if there will be evdev involved.
I'm getting either the machine to lock, the driver to complain about the engine being locked up or i just get X waiting endlessly for an available command buffer when i switch engine contexts enough. Possible reproductions: * f-spot and glxgears on exa (has render composite). * f-spot and torcs on exa (has render composite). * _even_ glxgears and evtest on xaa (no render composite but same routine to switch the engine state). None result in miEnqueue complaining though.
I've caught it with the intel hardware. evdev is there again. Here we go: [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: X(mieqEnqueue+0x2c7) [0x4cc337] 1: X(xf86PostMotionEventP+0xc4) [0x4768e4] 2: X(xf86PostMotionEvent+0xa9) [0x476ab9] 3: /usr/lib64/xorg/modules//input/evdev_drv.so [0x7f819f56e415] 4: /usr/lib64/xorg/modules//input/evdev_drv.so [0x7f819f56e513] 5: /usr/lib64/xorg/modules//input/evdev_drv.so [0x7f819f56be6b] 6: X [0x46d4d7] 7: /lib64/libc.so.6 [0x7f81a0d0e7b0] 8: /lib64/libc.so.6(ioctl+0x7) [0x7f81a0daa537] 9: /usr/lib64/libdrm.so.2 [0x7f819fc15ca3] 10: /usr/lib64/libdrm.so.2(drmCommandWrite+0x1b) [0x7f819fc15d2b] 11: /usr/lib64/xorg/modules//drivers/intel_drv.so(I830Sync+0x118) [0x7f819f997238] 12: /usr/lib64/xorg/modules//libexa.so(exaWaitSync+0x5c) [0x7f819ed2465c] 13: /usr/lib64/xorg/modules//libexa.so(ExaDoPrepareAccess+0x91) [0x7f819ed257d1] 14: /usr/lib64/xorg/modules//libexa.so [0x7f819ed28e11] 15: /usr/lib64/xorg/modules//libexa.so [0x7f819ed28fbd] 16: /usr/lib64/xorg/modules//libexa.so [0x7f819ed293a3] 17: /usr/lib64/xorg/modules//libexa.so(exaDoMigration+0x69f) [0x7f819ed29bdf] 18: /usr/lib64/xorg/modules//libexa.so(exaCopyNtoN+0x37f) [0x7f819ed2839f] 19: /usr/lib64/xorg/modules//libexa.so(exaComposite+0x9e0) [0x7f819ed2bd80] 20: X [0x52cb78] 21: X [0x51c07a] 22: X(Dispatch+0x364) [0x44beb4] 23: X(main+0x45d) [0x43231d] 24: /lib64/libc.so.6(__libc_start_main+0xe6) [0x7f81a0cfa586] 25: X [0x4316f9]
Juck. Why do we get both evdev _and_ exa symbols in these backtraces... That's just messed up. I'll get the r5xx engine to lock up again but this time with evdev running as well. Let's hope the mienqueue thing pops up then as well.
Right... Well... You know... if you want the evdev driver to overflow the event queue... then it helps if you actually generate events... like... by moving the mouse a bit... *stares at floor ashamedly* Anyway, there is full reproduction (when you've finally realised that you have to move the mouse a bit). And the conclusion then only becomes that this is 2 different issues. 1) the intel driver messing up. 2) the radeonhd driver messing up. Both are of course busywaiting on something related to the drm. It's just that the symptom is pretty much the same that these are looking alike. You don't fully need the patch jackson provided, you can also attach gdb to the process and then you should be able to see where it is stuck as well, but the patch helps (when it's there) as it requires no further intervention as the log contains everything one needs to know. I'll bring up a separate bug on the radeonhd issue, attach your backtrace of that, explain briefly how i reproduce it and explain the broad reason for it crashing. Sadly this won't mean that the issue can be fixed easily. In the intel case, it will also be severely non-trivial to fix, even though the hardware probably isn't as highly optimised and picky as radeonhd hw. But that's for upstream to deal with.
Splitting up the bugreport makes perfectly sense to me. Jiri, could report the intel driver issue upstream on bugs.freedesktop.org (product: xorg, Component: Driver/intel) and add Luc+me to Cc (libv@skynet.be, sndirsch@suse.de). You'll need to register first. Thanks.
radeonHD bug filed as: #447124
Thanks. Setting to NEEDINFO as long as there is no upstream bugreport for the intel driver issue.
Created as: https://bugs.freedesktop.org/show_bug.cgi?id=18663
Thanks.