Bug 245711 - OpenOffice freezes X11
Summary: OpenOffice freezes X11
Status: RESOLVED FIXED
: 286848 301813 305690 (view as bug list)
Alias: None
Product: openSUSE 10.2
Classification: openSUSE
Component: X.Org (show other bugs)
Version: Final
Hardware: Other Other
: P2 - High : Normal with 1 vote (vote)
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: E-mail List
URL: https://bugs.freedesktop.org/show_bug...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-15 10:09 UTC by Bernhard Walle
Modified: 2008-01-08 15:14 UTC (History)
9 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Xorg configuration file (4.61 KB, text/plain)
2007-02-15 10:10 UTC, Bernhard Walle
Details
Hardware information (427.78 KB, text/plain)
2007-02-15 10:11 UTC, Bernhard Walle
Details
Logfile (34.33 KB, text/plain)
2007-02-15 11:46 UTC, Bernhard Walle
Details
Block all signals before calling fork(). (652 bytes, patch)
2007-05-17 12:12 UTC, Egbert Eich
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Walle 2007-02-15 10:09:34 UTC
Sometimes OpenOffice.org freezes X11 which means that the mouse cursor moves but the Windows are not updated so you cannot do anything. SSH logon from another host is possible. X.org has 100 % CPU usage. Only kill -9 works. Reboot is not necessary. The behaviour is not reproducable. I'll attach my X.org configuration file and hwinfo.
Comment 1 Bernhard Walle 2007-02-15 10:10:19 UTC
Created attachment 119341 [details]
Xorg configuration file
Comment 2 Bernhard Walle 2007-02-15 10:11:20 UTC
Created attachment 119342 [details]
Hardware information
Comment 3 Stefan Dirsch 2007-02-15 10:28:13 UTC
Please also attach /var/log/Xorg.0.log. *Afterwards* check if this happens also with the "nv" driver. Thanks.
Comment 4 Bernhard Walle 2007-02-15 11:46:11 UTC
Created attachment 119367 [details]
Logfile

This is /var/log/Xorg.0.log.old, I guess that this is the correct logfile (/var/log/Xorg.0.log is the new logfile after restarting, right?)

The problem with the "nv" driver is that I don't like to work a week with the nv driver because it has the problem that in Firefox the text boxes are corrupt (this should be a known problem, my colleague also has it and told me to use the nvidia driver).

And as I told, the problem is not reproducible.
Comment 5 Stefan Dirsch 2007-02-15 12:15:27 UTC
But I need to know, if the problem is specific to the nvidia driver before I can reassign it to NVIDIA. Open an new bugreport for the nv driver issue.
Comment 6 Stefan Dirsch 2007-02-21 12:36:18 UTC
(In reply to comment #5)
> But I need to know, if the problem is specific to the nvidia driver before I
> can reassign it to NVIDIA. Open an new bugreport for the nv driver issue.
Bug #246421.
Comment 7 Bernhard Walle 2007-02-23 17:23:46 UTC
No problems with nv
Comment 8 Stefan Dirsch 2007-02-23 17:41:02 UTC
Ok. Lonni, probably this is "not easy" to reproduce.
Comment 9 Lonni Friedman 2007-02-23 17:43:04 UTC
Please generate and attach an nvidia-bug-report.log.

Also, please provide the steps required to reliably reproduce this problem, including any OOo documents.

thanks,
Lonni
Comment 10 Bernhard Walle 2007-02-23 18:10:19 UTC
It simply is *not* reproducible at all. If I have the crash next time, I can attach the log file.
Comment 11 Bernhard Walle 2007-03-06 17:57:45 UTC
As I don't have the card any more, I cannot provide the information even if I would use Openoffice 2 weeks. What should we do with the bug now?
Comment 12 Stefan Dirsch 2007-03-06 18:32:49 UTC
I'll try to reproduce on one of my machines before I send the card to NVIDIA due to Bug ##246421.
Comment 13 Stefan Dirsch 2007-03-08 15:26:53 UTC
Hmm ... I cannot reproduce this on my machine with NVIDIA driver release
1.0-9746+ with openSUSE 10.3 Alpha running Kernel 2.6.20 and X.Org 1.2.99.901.
I think it's better to close this one as WORKSFORME therefore. Bernhard, what do you think?
Comment 14 Bernhard Walle 2007-03-08 15:40:09 UTC
As I said, I also could not reproduce it. But not being able to reproduce it doesn't mean that there's no bug. However, I understand that the bug is impossible to fix without log files, etc., so I'm fine if you close the bug for now.
Comment 15 Stefan Dirsch 2007-03-08 15:43:36 UTC
Ok.
Comment 16 Miguel Freitas 2007-04-23 17:18:44 UTC
People have been reporting this same bug with other drivers beside NVIDIA binary. It seems related to popen'ing xkbcomp from xorg, as triggered by openoffice's menu displaying (which calls XkbGetKeyboard()).

I'd suggest linking to the upstream bugzilla entry:

https://bugs.freedesktop.org/show_bug.cgi?id=10525

Although it is not NVIDIA specific, from my personal experience, NVIDIA seems to make it more likely to happen. So far, I haven't been able to reproduce it with nv, for example.

I don't know if it is the case to reopen or not, but i'm quite sure the bug is not fixed. I can also confirm it with packages from xorg73 repo.
Comment 17 Miguel Freitas 2007-04-25 00:47:39 UTC
Today i have just confirmed that "nv" is also affected (my work computer hung again). More on this later.

I hope you don't mind to reopen this bug.

You might consider reassigning since nvidia is definitely not to blame...
Comment 18 Andreas Pfaller 2007-04-25 01:41:11 UTC
I get occasional (not reproduceable) freezes when trying to switch
the keyboard layout with the "KDE Keyboard Tool" (kxkb?) shown in
the "System Tray" of klipper. The mouse still moves, switching to
text console is not possible but a remote login works and top shows
X consuming 100% CPU. I use the nvidia driver.

Freezes happen when trying to switch the layout with a 
shortcut and also while selecting the layout in kxkb's popup menu.

The application I use at the time of switch does not matter.
The freezes occurred while using urxvt, firefox and probably several
others with a high enough probability that I started
switching layouts only when absolutely required for my task.
Comment 19 Miguel Freitas 2007-04-25 02:51:00 UTC
Andreas, imho you have probably found another way of triggering the very same problem. kxkb most likely uses the same Xlib request XkbGetKeyboard().

please take a look on the upstream bugzilla i've mentioned earlier. 

i've just posted there a way to "unfreeze" the Xorg using gdb. i guess it should help understanding the bug.
Comment 20 Stefan Dirsch 2007-04-27 09:14:32 UTC
Thanks, I've added myself to Cc of upstream bugreport. Since it seems only very few people are affected and therefore reproducing is very hard, I'll set this to NORMAL severity, but with HIGH priority so it won't get lost among my other bugreports.
Comment 21 Miguel Freitas 2007-04-27 10:02:42 UTC
Stefan,

I think i have a good explanation about this problem (which might well be a kernel bug). Unfortunately i was completely ignored in linux-kernel ml trying to obtain some advice on it. perhaps you might be able to comment, or at least ask your suse folks to think 5 minutes about it ;-)

http://www.uwsg.indiana.edu/hypermail/linux/kernel/0704.3/0717.html
Comment 23 Lars Marowsky-Bree 2007-05-08 14:58:59 UTC
This looks like a frozen syscall. I've got no idea, so the wheel of randomness fell on Oliver ;-)
Comment 24 Oliver Neukum 2007-05-14 09:24:45 UTC
According to your comment in
https://bugs.freedesktop.org/show_bug.cgi?id=10525#c25

setting SmartScheduleIdle to 1

SmartScheduleTimer (int sig)
{
    int olderrno = errno;

    SmartScheduleTime += SmartScheduleInterval;
    if (SmartScheduleIdle)
    {
        SmartScheduleStopTimer ();
    }
    errno = olderrno;
}

thus causing SmartScheduleStopTimer() to run.

Therefore it is clear that a signal is delivered correctly. After that the syscall
is restarted. This seems very correct to me. The only question is why is it called again and again.

What does "SmartScheduleStopTimer ();" do?
Comment 25 Jason Philbrook 2007-05-14 15:27:42 UTC
(In reply to comment #0)
> Sometimes OpenOffice.org freezes X11 which means that the mouse cursor moves
> but the Windows are not updated so you cannot do anything. SSH logon from
> another host is possible. X.org has 100 % CPU usage. Only kill -9 works. Reboot
> is not necessary. The behaviour is not reproducable. I'll attach my X.org
> configuration file and hwinfo.
> 

I had a similar problem when trying to open documents using Openoffice.org. I used yast/software management to re-install OpenOffice.org and the problem has been fixed. probably bad media in my case.
Comment 26 Egbert Eich 2007-05-17 12:08:10 UTC
SmartScheduleStopTimer() stops the itimer that periodically sends a SIGALARM. It looks like that calling this signal handler causes the process to be stalled in the fork syscall (that's why we see it stalled in the instruction after the syscall which is the address on the stack).
The intervall for the itimer is 20ms by default, this value can be changed by a command line option:  -schedInterval <#ms>.
Maybe we should block all signals before we call fork - not only while we are reading from stdio.
Comment 27 Egbert Eich 2007-05-17 12:12:17 UTC
Created attachment 140768 [details]
Block all signals before calling fork().

This patch may help to work around the problem. The real problem is still not fully understood: could it be that the SIGALRMs from the itimer are thrashing the system during fork?
Comment 28 Egbert Eich 2007-05-17 12:15:21 UTC
Miguel, you've done a lot of testing, would you be able to try this patch? The ticket is still NEEDINFO assigned to you.
Comment 29 Egbert Eich 2007-05-17 16:17:21 UTC
Arrgh! What a coincidence: right after i was done with this I hit this problem myself. Thanks to the detailed analysis in this ticket I was able to recover my server (no I didn't have my patch installed :( ).
It seems as if the process is hanging in the syscall as it never gets scheduled. Instead the itimer gets called quite a high rate producing the cpu load.
disabling the timer immediately recovers the system. 
This was the first time I have hit this problem and I really don't like to reproduce it on my production system.
Comment 30 Oliver Neukum 2007-05-22 07:04:00 UTC
It seems to me then, that this is not a kernel problem. You are simply requesting signals faster than you can process them. I suggest you use a static counter in the signal handler and increase the interval if you get called too often.
Comment 31 Stefan Dirsch 2007-05-29 10:52:41 UTC
Reassinging to Egbert. I think he knows this problematic best.
Comment 32 Egbert Eich 2007-05-29 16:56:35 UTC
Why not just try the attached patch?
I don't know who to assign this to, though. I've seen this problem once - on a machine that was rather busy. I don't think it will be easy to reproduce - at least not on purpose.
Comment 33 Stefan Dirsch 2007-05-29 17:11:29 UTC
Sorry, I overlooked the patch. Will create a RPM for testing.
Comment 34 Stefan Dirsch 2007-05-29 19:53:56 UTC
ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711

- update the xorg-x11-server package.
Comment 35 Bernhard Walle 2007-06-01 16:34:25 UTC
After I installed that RPMs, I have sometimes redraw problems, for example with xosd.
Comment 36 Miguel Freitas 2007-06-01 20:13:37 UTC
> Miguel, you've done a lot of testing, would you be able to try this patch? The
> ticket is still NEEDINFO assigned to you.

Egbert, sorry for the delay, i have just arrived from vacations. 
I will try the patch next week but it is not any easier for me to reproduce it so i think we might need more testers.

still i'm not sure disabling the signals will help. i haven't tried to fully understand how this "SmartScheduler" works in xorg but people said there could be a race somewhere (iirc the flag was supposed to be set but it is not)
Comment 37 Bernhard Walle 2007-06-04 20:37:27 UTC
(In reply to comment #35)
> After I installed that RPMs, I have sometimes redraw problems, for example with
> xosd.

After having crashes with OpenGL screensavers I re-installed the proprietary
nVidia driver (I don't use RPMs but the installer), this problem disappeared.

Comment 38 Bernhard Walle 2007-06-04 20:37:57 UTC
(In reply to comment #36)
> I will try the patch next week but it is not any easier for me to reproduce it
> so i think we might need more testers.

I also test. ;-)

Comment 39 Dmitry Golub 2007-06-22 17:02:37 UTC
For NVIDIA based machines:
Updating video driver from to:
NVIDIA-Linux-x86_64-100.14.09-pkg2.run, depending on hardware architecture, seems to fix the problem. 
It's been a few days so far, but still testing, and have no issues.
Comment 40 Eric Ward 2007-06-22 20:50:15 UTC
*** Bug 286848 has been marked as a duplicate of this bug. ***
Comment 41 Dmitry Golub 2007-06-25 16:37:48 UTC
Open Office freezez X11.... Updating NV driver didn't work in my tests, Office still freezez X11
Comment 42 Dmitry Golub 2007-07-03 16:41:18 UTC
Tried to update xorg-x11-server from ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711, but it didn't solve the problem, Open Office still freezing sometimes X11
Comment 43 Egbert Eich 2007-07-04 12:51:19 UTC
I have succeeded to trigger this bug at will. I have been obeserving it - usually it triggers when it's at least needed. 
I've added the fix to the server I'm running and added some debugging output to help me shed some light on this issue should it surface again.
Comment 44 Andreas Pfaller 2007-07-29 15:23:47 UTC
I have been running a self compiled xorg-x11-server-7.2-30.6 with
the patch from comment #27 applied since 2007-05-27 and I experienced
no freezes since.

In addition to my original freezes I did observe when using kxkb
(see comment #18) I later observed also freezes with OpenOffice
and NoMachine's nxclient with an unpatched X11 server and none
with the patch applied.
Comment 45 Dmitry Golub 2007-07-30 17:54:59 UTC
(In reply to comment #44 from Andreas Pfaller)
> I have been running a self compiled xorg-x11-server-7.2-30.6 with
> the patch from comment #27 applied since 2007-05-27 and I experienced
> no freezes since.
> 
> In addition to my original freezes I did observe when using kxkb
> (see comment #18) I later observed also freezes with OpenOffice
> and NoMachine's nxclient with an unpatched X11 server and none
> with the patch applied.
> 

Hello,
Is there patch available?
Comment 46 Dmitry Golub 2007-08-01 19:56:28 UTC
(In reply to comment #34 from Stefan Dirsch)
> ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711
> 
> - update the xorg-x11-server package.
> 

Installing xorg-x11-server package from ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711
Didn't solve the problem.
Either installation binary or src.rpm didn't work, I tried both.
Open Office still freezes X sometimes. 
Comment 47 Stefan Dirsch 2007-08-01 21:20:03 UTC
Date: Wed, 01 Aug 2007 11:52:20 -0500
From: machine owner <nucleo701@yahoo.com>
To: sndirsch@novell.com
Subject: Open Office freezes X11, bug 245711

Hello Stefan,

Updated xorg-x11-server from:
ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711

But it didn't solve my problem, the Open Office File Menu still freezes
X11.
May be there are some additional steps should be done?
I running SUSE-10.2 on AMD64 Dual Core on Nvidia, and Xgl with Beryl.
May be the Xgl could be the problem?
When system is freezes i see that Xgl is taking 100% of CPU.

Thanks

Regards, Dmitry
Comment 48 Stefan Dirsch 2007-08-01 21:22:12 UTC
Date: Wed, 01 Aug 2007 15:37:18 -0500
From: machine owner <nucleo701@yahoo.com>
To: sndirsch@novell.com
Subject: Open Office freezes X

Hello Stefan,

Installing xorg-x11-server package from
ftp:/ftp.suse.com/pub/people/sndirsch/RPMS/bug245711
Didn't solve the problem.
Either installation binary or src.rpm didn't work, I tried both.
Open Office still freezes X sometimes.

Regards, Dmitry
Comment 49 Pavel Nemec 2007-08-10 08:05:27 UTC
I have very similar bug #292646 but not in OpenOffice.org but probably in klipper (kde application). I "reproduce it with "nvida" and also with "nv" drivers.
Comment 50 Miguel Freitas 2007-08-10 10:44:59 UTC
novell seems to be kind of dormant these days... i don't understand why don't you just apply the Redhat patch (which is reported to work) and provide new rpm packages to users.

http://cvs.fedora.redhat.com/viewcvs/devel/xorg-x11-server/xserver-1.3.0-xkb-and-loathing.patch?view=markup

imho this is a kind of bug which should not be allowed to appear on a release ever again. it is so frustrating.
Comment 51 Stefan Dirsch 2007-08-10 11:04:05 UTC
Since you are aware of this patch, I'm sure you can tell me why this patch has never been commited to X.Org git repository, right?
Comment 52 Miguel Freitas 2007-08-10 11:46:08 UTC
no idea.

maybe because it is such a "/* XXX horrible awful hack */"?

if i were the x.org maintainer i would not commit it either: it is definitely not a proper fix, just an ugly workaround.

if i were the package maintainer i would probably do the same as redhat and debian guys and apply it. of course, i'm not aware of any side effects or problems. i just read about this patch from freedesktop's bugzilla.
Comment 53 Egbert Eich 2007-08-14 07:15:54 UTC
The fedora patch does a similar  thing as my patch in attachment #140768 [details]. My patch uses the standard X signal blocking/unblocking mechansim instead of calling signal().
The difference in semantics is that the patch in attachment #140768 [details] blocks the signal while the other one ignores it. This means a signal that's generated between the start and the end of the fork gets delivered once on unblock.
I don't see right off hand why the one patch should work better than the other.

In any case, about the 'dormant state of novell'. I've added debug code to my Xserver to shed some light onto this however I have since been unable to reproduce this issue.

It may have gone away with a system update which included a new kernel.
Comment 54 Miguel Freitas 2007-08-19 15:06:26 UTC
Egbert, i agree both patches are quite similar and they are both just workarounds to a problem we don't fully understand. however a few details are worth noticing:

- fedora patch says explicitly "Ignore (not just block) SIGALRM", so i must assume they are somehow aware of the blocking alternative and choose the other. i can only guess they tried blocking and it didn't work, because saving/restoring a sighandler_t is certainly clumsier.

- comment #47 reports that your patch (blocking signals) didn't fixed the problem.

- comment [#28/freedesktop's bug 10525] reports that fedora patch (ignoring signals) fixes the problem - including the reduced test case.

https://bugs.freedesktop.org/show_bug.cgi?id=10525#c28

btw, the test case might help to reproduce the bug:

https://bugs.freedesktop.org/show_bug.cgi?id=10525#c20
Comment 55 Stefan Dirsch 2007-08-20 13:54:22 UTC
*** Bug 301813 has been marked as a duplicate of this bug. ***
Comment 56 Stefan Dirsch 2007-08-29 01:10:51 UTC
*** Bug 305690 has been marked as a duplicate of this bug. ***
Comment 57 Seth R Arnold 2007-08-29 01:31:12 UTC
[Incidently, no nvidia, very little KDE running while reproducing this.]
Comment 58 Miguel Freitas 2007-09-21 21:29:40 UTC
I have just exchanged my xorg-x11-server for a custom built to include fedora's patch. I need at least 2 weeks to say if it seems to fix the problem or not - i will keep you updated about any findings. A friend of mine will probably try it as well (his computer freezes about once a week and i always help him with remote gdb unlocking...).

In case anybody wants to give it a try (at your own risk):

http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/xorg-x11-server-7.2-308.1mf.x86_64.rpm

the source package is here (i586 users need to recompile):

http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/xorg-x11-server-7.2-308.1mf.src.rpm

PLEASE NOTE that i didn't use the stock package from opensuse 10.2, but rather the xorg73 repository. you might need to update the other xorg packages as well - i don't know. don't blame me if it breaks your system ;-)

here is xorg73 repo for reference:

http://download.opensuse.org/repositories/xorg73/openSUSE_10.2/

Comment 59 Bernhard Walle 2007-09-27 17:35:11 UTC
Just for reference: I ran into this problem again with 10.3 (RC2, xorg-x11-server-7.2-138).
Comment 60 Miguel Freitas 2007-09-27 17:54:06 UTC
Just for reference: running fedora's patch since 21th, no problem so far. (not conclusive though)
Comment 61 Marcelo Jimenez 2007-10-01 14:01:04 UTC
Just for more reference :)
I had an way to reproduce the bug 100% of the times in my machine, which consisted in running setxkbmap to switch keyboard layouts. After installing Miguel's xorg-x11-server-7.2-308.1mf.x86_64.rpm, I have no more locks, I can switch layouts as I please, and OpenOffice no longer hangs my system.

I agree with him when he says that distributions should apply the patch, while obviously Xorg must *solve* the problem and not commit a workaround.

Not everyone is able to do that gdb trick, not to mention that it is extremely error prone. I am pretty sure that there must be lots of frustrated Suse users out there that have simply no idea of what is happening. They must be thinking "hum, linux is just as bad as my other operating system."
Comment 64 Stefan Dirsch 2007-10-03 13:42:32 UTC
submitted xorg-x11-server package with patch from comment #50 to

- 10.2 (update together with security update)
- 10.3 (maybe update later)
- STABLE
Comment 65 Stefan Dirsch 2008-01-08 15:14:30 UTC
*** Bug 292646 has been marked as a duplicate of this bug. ***