Bug 256363

Summary: Kbabel has a huge memory hole (2 GiB)
Product: [openSUSE] openSUSE 10.2 Reporter: Carlos Robinson <carlos.e.r>
Component: KDEAssignee: E-mail List <kde-maintainers>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Final   
Target Milestone: ---   
Hardware: PC   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: strace -ff log of an aborted run

Description Carlos Robinson 2007-03-21 12:53:08 UTC
After generating a database (settings / configure dictionary / database / scan single PO file ... close) the system became unresponsive and started to eat memory like mad:

top - 13:04:46 up 8 days,  2:06, 32 users,  load average: 10.21, 7.32, 4.10
Tasks: 214 total,   1 running, 209 sleeping,   0 stopped,   4 zombie
Cpu(s): 31.5%us, 14.9%sy,  0.0%ni,  0.0%id, 51.7%wa,  1.3%hi,  0.7%si,  0.0%st
Mem:   1035824k total,  1021136k used,    14688k free,    19320k buffers
Swap:  6297440k total,  1542152k used,  4755288k free,    26860k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25046 cer       18   0 1358m 763m 1972 D 22.5 75.5   0:27.36 kbabel

Notice the memory used and swap: when I killed kbabel, used swap has gone up to two gigabibytes in two or three minutes!

I think this is reproducible, but I don't know what info you may need.
Comment 1 Lubos Lunak 2007-03-21 15:32:20 UTC
Can you reproduce the problem? What was the .po file you used?
Comment 2 Carlos Robinson 2007-03-21 15:53:32 UTC
Yes, it does reproduce. Actually, I did it again just by telling it to translate a string using the compendium or the database - I think: I didn't take notes this time.

Note that the .po files were added to the database: it hung after, when I clicked on close the dialog. The "~/.kde/share/apps/kbabeldict/dbsearchengine" were already finished (I think). I did a rough translation later and it seemed to work. 

Right now I have it doing a rough translation (800 messages), with fuzzy enabled (slow), using the es.messages file and the database, and it has been working happily over an hour (95 minutes, 5% done). I expect it to be busy till tomorrow or so, so I can't run more tests till then.

The .po files I used for teaching the database are my own translations of xine-ui, xine-lib and gxine. I can forward them to you if you want them, or the database.

Or I can run it with trace enabled, or valgrind (if you tell me how). I suppose/hope there is a tool to find these problems out.
Comment 3 Lubos Lunak 2007-03-22 11:00:21 UTC
So, if I'm getting this right, you can reproduce it, but not intentionally, it happens at random times? Please open Konsole, do "ulimit -m 300000" to limit memory only to 300M and then run KBabel from that Konsole. This will prevent KBabel from eating all memory, so your machine will stay usable, KBabel itself should crash - please provide backtrace from the crash.


Comment 4 Carlos Robinson 2007-03-22 11:26:52 UTC
Will do.
But we'll have to wait, kbabel is at the moment running a rough translation (860 minutes, 61% done). Nine hours to go...

Meanwhile, should I install the corresponding -debug rpm?

P.S.: I think I can reproduce intentionally, I did twice. I will recheck.
Comment 5 Lubos Lunak 2007-03-22 14:31:04 UTC
Yes, of course. The needed debuginfo packages should be qt3, kdelibs3 and kdesdk3 debuginfo packages.
Comment 6 Carlos Robinson 2007-03-22 20:46:13 UTC
I have a problem.

After adding a -debug source to Yast (it was so slow that I finally decided to remove zmd completely), I told Yast to install kdelibs3-debuginfo, kdesdk3-debuginfo, and qt3-debuginfo. Checking dependencies, it complains:

#### YaST2 conflicts list - generated 2007-03-22 21:38:25 ####

This would invalidate atom:kdelibs3-3.5.5-45.2.i586.
    atom:kdelibs3-3.5.5-45.2.i586 has unfulfilled requirements
    Conflict Resolution:
        ( ) delete kdelibs3
            delete atom:kdelibs3-3.5.5-45.2.i586

#### YaST2 conflicts list END ###

What should I do? I aborted that install.

   [kbabel is still at 76% of the same rough translation: 
   7 hours more to go at least - it is slower than zmd!]
Comment 7 Carlos Robinson 2007-03-23 16:12:01 UTC
"ulimit -m 300000" does not make kbabel stop. See:

top - 16:16:42 up 10 days,  5:18, 35 users,  load average: 4.66, 3.59, 3.11
Tasks: 226 total,   2 running, 220 sleeping,   0 stopped,   4 zombie
Cpu(s): 21.2%us,  4.4%sy,  0.0%ni,  0.0%id, 73.0%wa,  1.2%hi,  0.2%si,  0.0%st
Mem:   1035824k total,  1023632k used,    12192k free,    18508k buffers
Swap:  6297440k total,   564312k used,  5733128k free,    69604k cached
Kill PID 31206 with signal [15]: 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                      
31206 cer       18   0  469m 441m  14m D  4.4 43.7   1:28.59 kbabel  
                                                                                     

On the xterm I started kbabel from:

cer@nimrodel:~/Documents/babel> ulimit -m
300000


I have installed kdesdk3-debuginfo, and qt3-debuginfo, but not kdelibs3-debuginfo, as version 3.5.5-45.2 is required but not available in the repo, only 3.5.5-45 (see previous comment and Bug 257168)

I have strace -ff log files. I'll attach later, if possible (0.5M tar.bz2)
Comment 8 Carlos Robinson 2007-03-23 16:16:16 UTC
Created attachment 126259 [details]
strace -ff log of an aborted run

Contains:

   33808 2007-03-23 16:31 strace.log
 8169791 2007-03-23 16:33 strace.log.31645
   35053 2007-03-23 16:31 strace.log.31646
   44083 2007-03-23 16:33 strace.log.31647
    6631 2007-03-23 16:31 strace.log.31648
     197 2007-03-23 16:31 strace.log.31649
   69894 2007-03-23 16:34 strace.log.31650
   15455 2007-03-23 16:31 strace.log.31651
   62447 2007-03-23 16:33 strace.log.31652
   27369 2007-03-23 16:31 strace.log.31653
  717592 2007-03-23 16:33 strace.log.31654
   58854 2007-03-23 16:31 strace.log.31655
   35314 2007-03-23 16:31 strace.log.31656
    7389 2007-03-23 16:31 strace.log.31657
    7867 2007-03-23 16:32 strace.log.31658
    8626 2007-03-23 16:32 strace.log.31664
    9518 2007-03-23 16:33 strace.log.31665
  444853 2007-03-23 16:32 strace.log.31666
   13814 2007-03-23 16:33 strace.log.31674
   15389 2007-03-23 16:33 strace.log.31675
Comment 9 Carlos Robinson 2007-03-24 00:09:34 UTC
Bug 257168 was clossed as duplicate of bug 233122, which was restricted (this should not happen!). 

Finally I had to mention this situation in the mail list to have it opened, so I finally know that debuginfo packages are not being provided due to their size. There is no solution yet for this.

So... we have two stoppers for this bug:

 1) Unless you can somehow provide me with kdelibs3-debuginfo, I won't be able to get a full backtrace for this bug.
 2) I need something else than "ulimit -m 300000" to provoke the crash, because it doesn't.

I'll be waiting for your response, but in a day I will probably clear the database, dunno if the bug will disappear then.
Comment 10 Lubos Lunak 2007-03-26 13:17:20 UTC
Use "ulimit -v 300000" instead, it seems ulimit -m is broken.
Comment 11 Carlos Robinson 2007-04-01 14:57:46 UTC
Ok.
I found that the crash depends on the database: I had deleted and recreated it and couldn't make kbabel eat memory. 

The good news is that I had kept a backup copy, just in case, restored it, and can make kbabel crash it any time I want (select project, select po file, select package, go to next untranslated message, select button to search in dictionay of translation database (eqv to ^2).

The bad news is that I don't get a backtrace:

cer@nimrodel:~> ulimit -v 300000
cer@nimrodel:~> kbabel
cer@nimrodel:~> terminate called after throwing an instance of 'St9bad_alloc'
  what():  St9bad_alloc
KCrash: Application 'kbabel' crashing...
Error: Could not determine display.
KCrash cannot reach kdeinit, launching directly.

cer@nimrodel:~>

I don't have a "KCrash" app that I can find, nor via yast install. If it wants kdeinit, it is indeed running:

24488 ?        S      0:00 start_kdeinit --new-startup +kcminit_startup
24489 ?        Ss     0:00 kdeinit Running...
24494 ?        S      0:00  \_ klauncher [kdeinit] --new-startup
24504 ?        S      0:01  \_ kwin [kdeinit] -session 10ddcdd2de000117278085700000308410000_1175437963_469787
24509 ?        S      0:00  \_ kio_file [kdeinit] file /tmp/ksocket-cer/klauncherTeRNKb.slave-socket /tmp/ksocket-c
er/kdesktopMq64nc.slave-socket
24526 ?        S      0:00  \_ konqueror [kdeinit] --preload
24653 ?        S      0:01  \_ konsole [kdeinit]
24654 pts/1    Ss+    0:00  |   \_ /bin/bash
24711 pts/2    Ss     0:00  |   \_ /bin/bash
24817 pts/2    R+     0:00  |       \_ ps afx
24818 pts/2    S+     0:00  |       \_ less
24770 ?        S      0:00  \_ /bin/sh /usr/bin/firefox
24775 ?        Sl     0:19      \_ /usr/lib/firefox/firefox-bin
24784 ?        Z      0:00          \_ [netstat] <defunct>


If it is trying to start something directly, wouldn't it be affected by ulimit too? Can I start some program before hand to handle the backtrace? Or can I make ulimit affect only kbabel and nothing else? Do you want me to run strace or ltrace and repeat the crash?

I'm willing to help trace the bug, but my knowledge is limited.

I may attach the backup of the database that produces the memory hole, if you want it; but even if the database is corrupt, memory holes should not happen, and databases should have a method to clean themselves, IMO.
Comment 12 Lubos Lunak 2007-05-03 17:06:09 UTC
If you can avoid the problem by recreating the database, then I'm afraid I consider this problem sorted out. The problem is solved for you and we don't develop KBabel. You can try reporting the problem at bugs.kde.org but KBabel is unmaintained upstream.