Bugzilla – Bug 245711
OpenOffice freezes X11
Last modified: 2008-01-08 15:14:30 UTC
Sometimes OpenOffice.org freezes X11 which means that the mouse cursor moves but the Windows are not updated so you cannot do anything. SSH logon from another host is possible. X.org has 100 % CPU usage. Only kill -9 works. Reboot is not necessary. The behaviour is not reproducable. I'll attach my X.org configuration file and hwinfo.
Created attachment 119341 [details] Xorg configuration file
Created attachment 119342 [details] Hardware information
Please also attach /var/log/Xorg.0.log. *Afterwards* check if this happens also with the "nv" driver. Thanks.
Created attachment 119367 [details] Logfile This is /var/log/Xorg.0.log.old, I guess that this is the correct logfile (/var/log/Xorg.0.log is the new logfile after restarting, right?) The problem with the "nv" driver is that I don't like to work a week with the nv driver because it has the problem that in Firefox the text boxes are corrupt (this should be a known problem, my colleague also has it and told me to use the nvidia driver). And as I told, the problem is not reproducible.
But I need to know, if the problem is specific to the nvidia driver before I can reassign it to NVIDIA. Open an new bugreport for the nv driver issue.
(In reply to comment #5) > But I need to know, if the problem is specific to the nvidia driver before I > can reassign it to NVIDIA. Open an new bugreport for the nv driver issue. Bug #246421.
No problems with nv
Ok. Lonni, probably this is "not easy" to reproduce.
Please generate and attach an nvidia-bug-report.log. Also, please provide the steps required to reliably reproduce this problem, including any OOo documents. thanks, Lonni
It simply is *not* reproducible at all. If I have the crash next time, I can attach the log file.
As I don't have the card any more, I cannot provide the information even if I would use Openoffice 2 weeks. What should we do with the bug now?
I'll try to reproduce on one of my machines before I send the card to NVIDIA due to Bug ##246421.
Hmm ... I cannot reproduce this on my machine with NVIDIA driver release 1.0-9746+ with openSUSE 10.3 Alpha running Kernel 2.6.20 and X.Org 1.2.99.901. I think it's better to close this one as WORKSFORME therefore. Bernhard, what do you think?
As I said, I also could not reproduce it. But not being able to reproduce it doesn't mean that there's no bug. However, I understand that the bug is impossible to fix without log files, etc., so I'm fine if you close the bug for now.
Ok.
People have been reporting this same bug with other drivers beside NVIDIA binary. It seems related to popen'ing xkbcomp from xorg, as triggered by openoffice's menu displaying (which calls XkbGetKeyboard()). I'd suggest linking to the upstream bugzilla entry: https://bugs.freedesktop.org/show_bug.cgi?id=10525 Although it is not NVIDIA specific, from my personal experience, NVIDIA seems to make it more likely to happen. So far, I haven't been able to reproduce it with nv, for example. I don't know if it is the case to reopen or not, but i'm quite sure the bug is not fixed. I can also confirm it with packages from xorg73 repo.
Today i have just confirmed that "nv" is also affected (my work computer hung again). More on this later. I hope you don't mind to reopen this bug. You might consider reassigning since nvidia is definitely not to blame...
I get occasional (not reproduceable) freezes when trying to switch the keyboard layout with the "KDE Keyboard Tool" (kxkb?) shown in the "System Tray" of klipper. The mouse still moves, switching to text console is not possible but a remote login works and top shows X consuming 100% CPU. I use the nvidia driver. Freezes happen when trying to switch the layout with a shortcut and also while selecting the layout in kxkb's popup menu. The application I use at the time of switch does not matter. The freezes occurred while using urxvt, firefox and probably several others with a high enough probability that I started switching layouts only when absolutely required for my task.
Andreas, imho you have probably found another way of triggering the very same problem. kxkb most likely uses the same Xlib request XkbGetKeyboard(). please take a look on the upstream bugzilla i've mentioned earlier. i've just posted there a way to "unfreeze" the Xorg using gdb. i guess it should help understanding the bug.
Thanks, I've added myself to Cc of upstream bugreport. Since it seems only very few people are affected and therefore reproducing is very hard, I'll set this to NORMAL severity, but with HIGH priority so it won't get lost among my other bugreports.
Stefan, I think i have a good explanation about this problem (which might well be a kernel bug). Unfortunately i was completely ignored in linux-kernel ml trying to obtain some advice on it. perhaps you might be able to comment, or at least ask your suse folks to think 5 minutes about it ;-) http://www.uwsg.indiana.edu/hypermail/linux/kernel/0704.3/0717.html
This looks like a frozen syscall. I've got no idea, so the wheel of randomness fell on Oliver ;-)
According to your comment in https://bugs.freedesktop.org/show_bug.cgi?id=10525#c25 setting SmartScheduleIdle to 1 SmartScheduleTimer (int sig) { int olderrno = errno; SmartScheduleTime += SmartScheduleInterval; if (SmartScheduleIdle) { SmartScheduleStopTimer (); } errno = olderrno; } thus causing SmartScheduleStopTimer() to run. Therefore it is clear that a signal is delivered correctly. After that the syscall is restarted. This seems very correct to me. The only question is why is it called again and again. What does "SmartScheduleStopTimer ();" do?
(In reply to comment #0) > Sometimes OpenOffice.org freezes X11 which means that the mouse cursor moves > but the Windows are not updated so you cannot do anything. SSH logon from > another host is possible. X.org has 100 % CPU usage. Only kill -9 works. Reboot > is not necessary. The behaviour is not reproducable. I'll attach my X.org > configuration file and hwinfo. > I had a similar problem when trying to open documents using Openoffice.org. I used yast/software management to re-install OpenOffice.org and the problem has been fixed. probably bad media in my case.
SmartScheduleStopTimer() stops the itimer that periodically sends a SIGALARM. It looks like that calling this signal handler causes the process to be stalled in the fork syscall (that's why we see it stalled in the instruction after the syscall which is the address on the stack). The intervall for the itimer is 20ms by default, this value can be changed by a command line option: -schedInterval <#ms>. Maybe we should block all signals before we call fork - not only while we are reading from stdio.
Created attachment 140768 [details] Block all signals before calling fork(). This patch may help to work around the problem. The real problem is still not fully understood: could it be that the SIGALRMs from the itimer are thrashing the system during fork?
Miguel, you've done a lot of testing, would you be able to try this patch? The ticket is still NEEDINFO assigned to you.
Arrgh! What a coincidence: right after i was done with this I hit this problem myself. Thanks to the detailed analysis in this ticket I was able to recover my server (no I didn't have my patch installed :( ). It seems as if the process is hanging in the syscall as it never gets scheduled. Instead the itimer gets called quite a high rate producing the cpu load. disabling the timer immediately recovers the system. This was the first time I have hit this problem and I really don't like to reproduce it on my production system.
It seems to me then, that this is not a kernel problem. You are simply requesting signals faster than you can process them. I suggest you use a static counter in the signal handler and increase the interval if you get called too often.
Reassinging to Egbert. I think he knows this problematic best.
Why not just try the attached patch? I don't know who to assign this to, though. I've seen this problem once - on a machine that was rather busy. I don't think it will be easy to reproduce - at least not on purpose.
Sorry, I overlooked the patch. Will create a RPM for testing.
ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711 - update the xorg-x11-server package.
After I installed that RPMs, I have sometimes redraw problems, for example with xosd.
> Miguel, you've done a lot of testing, would you be able to try this patch? The > ticket is still NEEDINFO assigned to you. Egbert, sorry for the delay, i have just arrived from vacations. I will try the patch next week but it is not any easier for me to reproduce it so i think we might need more testers. still i'm not sure disabling the signals will help. i haven't tried to fully understand how this "SmartScheduler" works in xorg but people said there could be a race somewhere (iirc the flag was supposed to be set but it is not)
(In reply to comment #35) > After I installed that RPMs, I have sometimes redraw problems, for example with > xosd. After having crashes with OpenGL screensavers I re-installed the proprietary nVidia driver (I don't use RPMs but the installer), this problem disappeared.
(In reply to comment #36) > I will try the patch next week but it is not any easier for me to reproduce it > so i think we might need more testers. I also test. ;-)
For NVIDIA based machines: Updating video driver from to: NVIDIA-Linux-x86_64-100.14.09-pkg2.run, depending on hardware architecture, seems to fix the problem. It's been a few days so far, but still testing, and have no issues.
*** Bug 286848 has been marked as a duplicate of this bug. ***
Open Office freezez X11.... Updating NV driver didn't work in my tests, Office still freezez X11
Tried to update xorg-x11-server from ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711, but it didn't solve the problem, Open Office still freezing sometimes X11
I have succeeded to trigger this bug at will. I have been obeserving it - usually it triggers when it's at least needed. I've added the fix to the server I'm running and added some debugging output to help me shed some light on this issue should it surface again.
I have been running a self compiled xorg-x11-server-7.2-30.6 with the patch from comment #27 applied since 2007-05-27 and I experienced no freezes since. In addition to my original freezes I did observe when using kxkb (see comment #18) I later observed also freezes with OpenOffice and NoMachine's nxclient with an unpatched X11 server and none with the patch applied.
(In reply to comment #44 from Andreas Pfaller) > I have been running a self compiled xorg-x11-server-7.2-30.6 with > the patch from comment #27 applied since 2007-05-27 and I experienced > no freezes since. > > In addition to my original freezes I did observe when using kxkb > (see comment #18) I later observed also freezes with OpenOffice > and NoMachine's nxclient with an unpatched X11 server and none > with the patch applied. > Hello, Is there patch available?
(In reply to comment #34 from Stefan Dirsch) > ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711 > > - update the xorg-x11-server package. > Installing xorg-x11-server package from ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711 Didn't solve the problem. Either installation binary or src.rpm didn't work, I tried both. Open Office still freezes X sometimes.
Date: Wed, 01 Aug 2007 11:52:20 -0500 From: machine owner <nucleo701@yahoo.com> To: sndirsch@novell.com Subject: Open Office freezes X11, bug 245711 Hello Stefan, Updated xorg-x11-server from: ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711 But it didn't solve my problem, the Open Office File Menu still freezes X11. May be there are some additional steps should be done? I running SUSE-10.2 on AMD64 Dual Core on Nvidia, and Xgl with Beryl. May be the Xgl could be the problem? When system is freezes i see that Xgl is taking 100% of CPU. Thanks Regards, Dmitry
Date: Wed, 01 Aug 2007 15:37:18 -0500 From: machine owner <nucleo701@yahoo.com> To: sndirsch@novell.com Subject: Open Office freezes X Hello Stefan, Installing xorg-x11-server package from ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711 Didn't solve the problem. Either installation binary or src.rpm didn't work, I tried both. Open Office still freezes X sometimes. Regards, Dmitry
I have very similar bug #292646 but not in OpenOffice.org but probably in klipper (kde application). I "reproduce it with "nvida" and also with "nv" drivers.
novell seems to be kind of dormant these days... i don't understand why don't you just apply the Redhat patch (which is reported to work) and provide new rpm packages to users. http://cvs.fedora.redhat.com/viewcvs/devel/xorg-x11-server/xserver-1.3.0-xkb-and-loathing.patch?view=markup imho this is a kind of bug which should not be allowed to appear on a release ever again. it is so frustrating.
Since you are aware of this patch, I'm sure you can tell me why this patch has never been commited to X.Org git repository, right?
no idea. maybe because it is such a "/* XXX horrible awful hack */"? if i were the x.org maintainer i would not commit it either: it is definitely not a proper fix, just an ugly workaround. if i were the package maintainer i would probably do the same as redhat and debian guys and apply it. of course, i'm not aware of any side effects or problems. i just read about this patch from freedesktop's bugzilla.
The fedora patch does a similar thing as my patch in attachment #140768 [details]. My patch uses the standard X signal blocking/unblocking mechansim instead of calling signal(). The difference in semantics is that the patch in attachment #140768 [details] blocks the signal while the other one ignores it. This means a signal that's generated between the start and the end of the fork gets delivered once on unblock. I don't see right off hand why the one patch should work better than the other. In any case, about the 'dormant state of novell'. I've added debug code to my Xserver to shed some light onto this however I have since been unable to reproduce this issue. It may have gone away with a system update which included a new kernel.
Egbert, i agree both patches are quite similar and they are both just workarounds to a problem we don't fully understand. however a few details are worth noticing: - fedora patch says explicitly "Ignore (not just block) SIGALRM", so i must assume they are somehow aware of the blocking alternative and choose the other. i can only guess they tried blocking and it didn't work, because saving/restoring a sighandler_t is certainly clumsier. - comment #47 reports that your patch (blocking signals) didn't fixed the problem. - comment [#28/freedesktop's bug 10525] reports that fedora patch (ignoring signals) fixes the problem - including the reduced test case. https://bugs.freedesktop.org/show_bug.cgi?id=10525#c28 btw, the test case might help to reproduce the bug: https://bugs.freedesktop.org/show_bug.cgi?id=10525#c20
*** Bug 301813 has been marked as a duplicate of this bug. ***
*** Bug 305690 has been marked as a duplicate of this bug. ***
[Incidently, no nvidia, very little KDE running while reproducing this.]
I have just exchanged my xorg-x11-server for a custom built to include fedora's patch. I need at least 2 weeks to say if it seems to fix the problem or not - i will keep you updated about any findings. A friend of mine will probably try it as well (his computer freezes about once a week and i always help him with remote gdb unlocking...). In case anybody wants to give it a try (at your own risk): http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/xorg-x11-server-7.2-308.1mf.x86_64.rpm the source package is here (i586 users need to recompile): http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/xorg-x11-server-7.2-308.1mf.src.rpm PLEASE NOTE that i didn't use the stock package from opensuse 10.2, but rather the xorg73 repository. you might need to update the other xorg packages as well - i don't know. don't blame me if it breaks your system ;-) here is xorg73 repo for reference: http://download.opensuse.org/repositories/xorg73/openSUSE_10.2/
Just for reference: I ran into this problem again with 10.3 (RC2, xorg-x11-server-7.2-138).
Just for reference: running fedora's patch since 21th, no problem so far. (not conclusive though)
Just for more reference :) I had an way to reproduce the bug 100% of the times in my machine, which consisted in running setxkbmap to switch keyboard layouts. After installing Miguel's xorg-x11-server-7.2-308.1mf.x86_64.rpm, I have no more locks, I can switch layouts as I please, and OpenOffice no longer hangs my system. I agree with him when he says that distributions should apply the patch, while obviously Xorg must *solve* the problem and not commit a workaround. Not everyone is able to do that gdb trick, not to mention that it is extremely error prone. I am pretty sure that there must be lots of frustrated Suse users out there that have simply no idea of what is happening. They must be thinking "hum, linux is just as bad as my other operating system."
submitted xorg-x11-server package with patch from comment #50 to - 10.2 (update together with security update) - 10.3 (maybe update later) - STABLE
*** Bug 292646 has been marked as a duplicate of this bug. ***