Bug 603316

Summary: OpenOffice_org seems to run out of file descriptors
Product: [openSUSE] openSUSE 11.3 Reporter: Thomas Biege <thomas>
Component: OpenOffice.orgAssignee: Michal Vyskocil <mvyskocil>
Status: RESOLVED FIXED QA Contact: Chao Wei <cwei>
Severity: Critical    
Priority: P1 - Urgent CC: forgotten_DBWoND-zrO, hemathor, msvec, pmladek
Version: Milestone 6   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Community User Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: strace from my crash
thomas strace file gzipped
backtrace from the crash on hope.suse.cz

Description Thomas Biege 2010-05-06 13:49:46 UTC
Dann, a community user, reports OOo failing while saving documents. I was able to reproduce it on my 11.3m6 installation.

- start oowriter (or aonther component)
- enter something
- try to save it
- error message about an i/o error

The strace output indicates that oowriter runs out of filedescriptors:

5850  open("/proc/self/maps", O_RDONLY) = 1023
5850  fstat64(1023, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
5850  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x83216000
5850  read(1023, "08048000-0804a000 r-xp 00000000 "..., 1024) = 1024
5850  read(1023, "m/libspellli.so\n835c6000-835c700"..., 1024) = 1024
5850  read(1023, ".11.1.so\n83612000-83613000 r--p "..., 1024) = 1024
[...]
5850  munmap(0xbf8a4000, 12288)         = 0
5850  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
5850  gettid()                          = 5850
5850  rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
5850  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
5850  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
5850  rt_sigprocmask(SIG_UNBLOCK, [HUP INT ILL BUS FPE SEGV USR2 TERM], NULL, 8) = 0
5850  rt_sigprocmask(SIG_BLOCK, [QUIT], NULL, 8) = 0
5850  open("/proc/self/maps", O_RDONLY) = -1 EMFILE (Too many open files)

thomas> grep EMFILE ~/w.trc  | wc -l
1299
Comment 1 Petr Mladek 2010-05-07 14:28:16 UTC
Hmm, OOo crashes heavily on openSUSE-11.3-m6-x86_64 when the OpenOffice_org-LanguageTool package is installed. I do not see the problem when I remove it.

Thomas, does it help you to remove the OpenOffice_org-LanguageTool package?
Comment 2 Petr Mladek 2010-05-07 16:40:48 UTC
Hmm, Donn's problems were somehow related to SELinux and too permissive access rights to ~/.ooo3.

Thomas, do you have something similar by chance?
Comment 3 Thomas Biege 2010-05-08 10:05:12 UTC
After removing the languagetool rpms OOo saving works again.

I doubt that the problem is caused by SELinux because it also happens if SELinux is disable. But to be really sure we need hard facts.
Comment 4 Petr Mladek 2010-05-10 08:11:48 UTC
The LanguageTool makes OOo very unstable on openSUSE-11.3-m6. I do not see this problem in SLED11. I am going to debug it.

Thanks a lot for feedback.
Comment 5 Petr Mladek 2010-05-10 13:49:20 UTC
BTW: the packages junit4 and jgroups have not been rebuilt in Factory for 50 days, see https://build.opensuse.org/stage/project/status?project=openSUSE%3AFactory&filter_devel=All+Packages&ignore_pending=true&limit_to_fails=false&limit_to_fails=true&include_versions=false&commit=Filter+results

The build failed because of "Too many open files" => it is similar to this problem with too many open descriptors.

I wonder if these problems are really related and the updated openJDK has crazy regression.


Note that the LanguageTool was not installed by default before openSUSE-11.3-m6, so the problem was hidden. It might have been there the whole 50 days.

Also I do not see it on SLED11 with OOo and LanguageTool built from the same sources => I tend to thing that the problem is related to any other package (openJDK).

Michael, does it trigger any bell?
Comment 6 Michal Vyskocil 2010-05-11 11:30:19 UTC
Hi Petr, I already know about the "Too many open files" problems, but was not able to reproduce locally - the local build (using chroot, or directly in unpacked source directory) works well. I never heard about some similar problem on upstream.

Adrian asked me to use the kvm based osc build, but I failed to establish that - it ends on weird qemu error. So don't have an idea what's going wrong.

I'll try to reproduce it (when an another 2GB zypper up will be finished).
Comment 7 Michal Vyskocil 2010-05-11 13:08:36 UTC
Hi all, so I wrote a simple reproducer - not sure why the new openjdk opens a lot of /proc/PID/maps, but the previous one did not do that so extensively.

Thomas: can you attach the strace file please?
Comment 8 Michal Vyskocil 2010-05-11 13:22:42 UTC
I was not able to reproduce it using 

$ rpm -q OpenOffice_org OpenOffice_org-LanguageTool java-1_6_0-openjdk
OpenOffice_org-3.2.0.99.3-1.1.x86_64
OpenOffice_org-LanguageTool-1.0.0-4.6.noarch
java-1_6_0-openjdk-1.6.0.0_b17-3.2.x86_64

Save and open of an arbitrary document work without a crash.
Comment 9 Petr Mladek 2010-05-11 15:36:57 UTC
Created attachment 361376 [details]
strace from my crash

I opened writer, wrote few letters. OOo freezed for a bit and crashed.

Also this crash is caused by too many open files:

--- cut ---
the 32748 16:15:19.272428 open("/proc/self/maps", O_RDONLY) = -1 EMFILE (Too many open files)
32748 16:15:19.272587 mmap(0x7fa029e5b000, 12288, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fa029e5b000
32748 16:15:19.272697 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
32748 16:15:19.272935 sched_getaffinity(32748, 32, {1, 0, 0, 0}) = 32
32748 16:15:19.273044 sched_getaffinity(32748, 32, {1, 0, 0, 0}) = 32
32748 16:15:19.273146 gettid()          = 32748
32748 16:15:19.273234 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
32748 16:15:19.273324 rt_sigprocmask(SIG_UNBLOCK, [HUP INT ILL BUS FPE SEGV USR2 TERM], NULL, 8) = 0
32748 16:15:19.273420 rt_sigprocmask(SIG_BLOCK, [QUIT], NULL, 8) = 0
32748 16:15:19.273591 open("/proc/self/maps", O_RDONLY) = -1 EMFILE (Too many open files)
32748 16:15:19.273704 mmap(0x7fa029e5b000, 12288, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa029e5b000
32748 16:15:19.273802 mprotect(0x7fa029e5b000, 12288, PROT_NONE) = 0
32748 16:15:19.274327 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
--- cut ---
Comment 10 Michal Vyskocil 2010-05-11 16:54:44 UTC
So from the information I already have - new openjdk opens /proc/self/maps and not close it, which seems the root of all those problems.

Simple grep tells me, there are only two places which opened it

static bool find_vma(address addr, address* vma_low, address* vma_high) in hotspot/src/os/linux/vm/os_linux.cpp [1]

static bool read_lib_info(struct ps_prochandle* ph) in hotspot/agent/src/os/linux/ps_proc.c [2]

however those functions are 1) simple 2) call fclose after fopen properly.

[1] http://hg.openjdk.java.net/jdk6/jdk6-gate/hotspot/file/587f774a3e70/src/os/linux/vm/os_linux.cpp
[2] http://hg.openjdk.java.net/jdk6/jdk6-gate/hotspot/file/587f774a3e70/agent/src/os/linux/ps_proc.c

But it seems the bug is in JVM.

Petr: I see segfault in your strace. Does it means it crashing? And if so, can I have a stack trace?
Comment 11 Thomas Biege 2010-05-12 08:20:20 UTC
Created attachment 361609 [details]
thomas strace file gzipped
Comment 12 Petr Mladek 2010-05-12 13:02:11 UTC
Created attachment 361695 [details]
backtrace from the crash on hope.suse.cz

The stack of the main thread is somehow malformed. I hope that the stack from the other threads might be more useful.
Comment 13 Michal Vyskocil 2010-05-13 07:05:43 UTC
So I see the same problem in your strace outputs as I have during a run of my simple test program. The problem is that JVM opens /proc/self/maps, but don't close it. Not sure why ...
Comment 14 Michal Vyskocil 2010-05-13 09:02:06 UTC
So looking deeply on all patches are applied against openjdk sources I found the most probably place - the stack protector patch [1] requested by OpenOffice guys in bug#589021 seems to be wrong.

It opens the /proc/self/maps, but it returns false without a fclose call, so it's the most probably place, where it causes a problems. Lesson - never ever trust upstream and check their patches.

Fixed by sr#39886

[1] http://cr.openjdk.java.net/~aph/6929067-jdk7-webrev-4/
Comment 15 Michal Vyskocil 2010-05-13 13:47:44 UTC
ok, the right sr is 39894, the previous fix was not complete, thanks Petr for a testing
Comment 16 Petr Mladek 2010-05-14 17:51:09 UTC
*** Bug 604094 has been marked as a duplicate of this bug. ***
Comment 17 Petr Mladek 2010-08-20 16:49:29 UTC
*** Bug 616991 has been marked as a duplicate of this bug. ***
Comment 18 Petr Mladek 2010-08-26 16:14:46 UTC
JFYI, the fixed openJDK has appeared in the update channel for openSUSE 11.1 , 11.2, and 11.3. See the bug #623905 and the bug #601243.
Comment 19 Petr Mladek 2010-09-13 14:03:29 UTC
*** Bug 628761 has been marked as a duplicate of this bug. ***
Comment 20 Bernhard Wiedemann 2016-04-15 11:42:46 UTC
This is an autogenerated message for OBS integration:
This bug (603316) was mentioned in
https://build.opensuse.org/request/show/39894 Factory / java-1_6_0-openjdk