|
Bugzilla – Full Text Bug Listing |
can you get output of: cat /proc/interrupts to see if an interrupt is triggered very often Hi here it is (I'm running nearly the entire day and ksoftfirqd is normal for now).
I'll provide another one (maybe also a graph) when the ksoftirqd problem occurs (I supopose that is really needed).
> cat /proc/interrupts
CPU0 CPU1
0: 26863620 0 IO-APIC-edge timer
1: 254 0 IO-APIC-edge i8042
6: 0 0 IO-APIC-edge lirc_ite8709
8: 1 0 IO-APIC-edge rtc0
9: 8972 0 IO-APIC-fasteoi acpi
12: 564 0 IO-APIC-edge i8042
16: 224750 0 IO-APIC-fasteoi uhci_hcd:usb2, nvidia
18: 1191186 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0
19: 1121921 0 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb4, uhci_hcd:usb7
21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
22: 3042821 0 IO-APIC-fasteoi HDA Intel
23: 2 0 IO-APIC-fasteoi ehci_hcd:usb5, uhci_hcd:usb6
216: 2369317 0 PCI-MSI-edge iwl3945
217: 6094614 0 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 4722382 16643134 Local timer interrupts
RES: 1373132 3216824 Rescheduling interrupts
CAL: 1511391 2413558 function call interrupts
TLB: 32209 34157 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
Hi again. So I have here the latest interrupts before I rebooted. 0: 37799577 0 IO-APIC-edge timer 1: 309 0 IO-APIC-edge i8042 6: 47784 0 IO-APIC-edge lirc_ite8709 8: 1 0 IO-APIC-edge rtc0 9: 11984 0 IO-APIC-fasteoi acpi 12: 1074 0 IO-APIC-edge i8042 16: 276919 0 IO-APIC-fasteoi uhci_hcd:usb2, nvidia 18: 1315728 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0 19: 1579638 0 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb4, uhci_hcd:usb7 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 22: 3305783 0 IO-APIC-fasteoi HDA Intel 23: 2 0 IO-APIC-fasteoi ehci_hcd:usb5, uhci_hcd:usb6 216: 3033526 0 PCI-MSI-edge iwl3945 217: 9101163 0 PCI-MSI-edge eth0 NMI: 0 0 Non-maskable interrupts LOC: 6434802 22944923 Local timer interrupts RES: 1790955 4134157 Rescheduling interrupts CAL: 1564485 2469460 function call interrupts TLB: 43600 45062 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 I'm also joining the chart from collected /proc/interrupts each minute during ~870 minutes (with a simple script) created with OOo. basically I see nothing very unusual on it. Just to note: 870th minute is around 18:30. Resume from s2ram is at 375th minute (9:15 morning). I closed all browsers and went out for skiing at around 12:30 (I left only Azureus open). So the ksoftirqd problem appeared when I was out. There is little change in slope at minute 450 (11:30), except the "eth0" curve (3rd from top). In var/log/messages there is nothing unusual, only classic dhcprequest stuff (same as in previous post). I also realized now that the minutes aren't extra accurate (just a sleep 60) which can add some error. regards. Created attachment 264360 [details]
chart of /proc/interrupts
Temporal chart of /proc/interrupts
Hello. I have some good news. I downloaded/compiled/installed the newest kernel from kernel.org (2.6.28.2) and I'm running on it for some week now without any problem. In fact, there were lots of changes concerning softirqd in the 2.6.28 release. Maybe an upgrade of the current opensuse kernel (2.6.27.7-9.1) in repositories would fix this for anybody else. Regards. Created attachment 279389 [details]
proc.interrupts
Created attachment 279390 [details]
var.log.messages
Created attachment 279486 [details]
/var/log/messages file
I also suffer from this bug.
uname -ir:
2.6.27.19-3.2-default x86_64
/proc/interrupts:
CPU0 CPU1
0: 71832 72828 IO-APIC-edge timer
1: 5 7 IO-APIC-edge i8042
8: 1 0 IO-APIC-edge rtc0
9: 0 1 IO-APIC-fasteoi acpi
12: 72 64 IO-APIC-edge i8042
14: 1690 1616 IO-APIC-edge ata_piix
15: 0 0 IO-APIC-edge ata_piix
16: 581 168 IO-APIC-fasteoi nvidia
17: 26663 10586 IO-APIC-fasteoi ata_piix, eth0, b43
18: 0 0 IO-APIC-fasteoi mmc0
19: 1 1 IO-APIC-fasteoi ohci1394
20: 3236 2221 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4, ehci_hcd:usb7
21: 8821 2932 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb5, HDA Intel
22: 0 0 IO-APIC-fasteoi ehci_hcd:usb3, uhci_hcd:usb6
NMI: 0 0 Non-maskable interrupts
LOC: 53608 57620 Local timer interrupts
RES: 13893 16887 Rescheduling interrupts
CAL: 1241 296 function call interrupts
TLB: 216 229 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
lspci:
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation GeForce 8400M GS (rev a1)
03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
03:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
03:01.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
03:01.3 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)
0c:00.0 Network controller: Broadcom Corporation BCM4311 802.11b/g WLAN (rev 01)
I attached the /var/log/messages file.
I have a dual core Intel Core2 CPU, and one of the cores was totally used by ksoftirqd and the network died after some time (`ping` also had an error with some sort of sendmessage buffer).
I can reproduce this bug with Eclipse and Aptana (and probably with Last.fm). I have an Eclipse installed with Aptana and the latter wants to download some MBs of update. At 21% it stops as Last.fm also does. Closing Eclipse and Last.fm the CPU usage caused by ksoftirqd/1 decreases to 0-1%. Hi. Did you try to update to the newest kernel (2.6.28.x)? It solved this for me (at least I didn't have any problems so far). In deed kernel 2.6.28-next-20090107-20090107.18-default from http://download.opensuse.org/repositories/Kernel:/linux-next/openSUSE_11.1/ seems to resolve the issue. If I start the self compiled partgui(/usr/sbin/piguicqt) then it will simply return an error with the new kernel while it has always caused a light variant of the 100%-cpu-ksoftirqd bug with the old kernel fortunately not triggering any disk access (which makes things much worse). However linux-next is not an option for me since it does not awake from s2ram at me as pm-suspend.log revealed. Created attachment 279654 [details] erroneous partgui that kann trigger the ksoftirqd bug Here I have uploaded an erroneously self-compiled version of partgui that can trigger a light version of the ksoftirqd bug featuring 100% cpu load but no disk access. Note that the cause for the ksoftirqd overload during normal operation will be different from that kind of artificially triggered one (and more severe because of hdd-access overload). To test with it type: > make install (on a 64bit machine) > /usr/sbin/piguicqt (as root) Created attachment 279655 [details]
proc.interrupts for partgui triggered overhang
Created attachment 287170 [details]
still far from being resolved
this time it is a permanent hangup (sometimes it goes away by itself). occurs on both platforms: i586, x86_64 should perhaps have been a shipment blocker. why don`t they offer us a downgrade? Of what use will the 'next' version be if it comes with its very own set of unacceptable bugs? I believe it should be resolved for OpenSuse11.1. Only Novell is allowed to set the priority. If no one can duplicate this without the nvidia driver loaded, there is not going to be anything that we can do about this. So, can someone run without the nvidia driver and still see this? Hi For now I'm using the vanilla 2.6.28.2 for 2 months without problems. I can eventually try to start with 2.6.27 (original kernel from osuse 11.1, I still have it in grub) and to not use the nvidia driver. Fairly easy but it will take some time for me (the bug appears after few minutes but sometimes after several houts or a day). Perhaps I forgot to mention that this occurs with the ati radeonhd driver for x86_64 platforms as well. Unfortunately linux-next(2.6.28) is not an option as long as the s2ram problems are not resolved there (Bug 496954) though the issue does not seem to apply to linux-next(2.6.28). Has anyone tried to trigger the overload with 2.6.27 kernel and my partgui test compilation? . (In reply to comment #16) > Of what use will the 'next' version be if it comes with its very own set of > unacceptable bugs? I believe it should be resolved for OpenSuse11.1. The linux-next kernel isn't an official openSUSE release. The description itself indicates where to report bugs while using it. If you're comfortable building and testing kernels, I can give you some tips on how to track down the bug more quickly. Once you've identified the upstream fix, then we can backport it to the openSUSE 11.1 kernel. Could you give me some advice on how to activate Apparmor for the 2.6.30 kernel provided at ftp.suse.com/pub/projects/kernel/kotd/master? There is still no replacement for the 2.6.27 kernel series which keeps suffering from the ksoftirq-bug! For me 2.6.30 is now working best, better than linux-next (no s2ram) and of course better than 2.6.27. AppArmor hasn't been forward-ported to 2.6.30 yet. AppArmor has since been forward ported to the master kernel and has been available since 11.2 M4. I still haven't been able to reproduce on 11.1. *** Bug 540550 has been marked as a duplicate of this bug. *** moving to 11.2 as Elmar sees it there too. *** Bug 543235 has been marked as a duplicate of this bug. *** Created attachment 323402 [details]
another /proc/interrupts table
Created attachment 324468 [details]
subsequent /proc/interrupts snapshots, os11.2 RC1
Hi all. I can confirm this ksoftirq bug happens also on 11.2 RC1 On 11.1 with vanilla kernel (2.6.28, 29) it did not happen anymore. I suppose it will be same case here. I have now 2 versions of /proc/interrupts monitored so I'll look at them if I can see any difference between them (I'll reply then). Could it be that opensuse patches to kernel could cause this? Created attachment 324507 [details]
subsequent snapshots (delay:none, 0.5s), os11.2 RC1; nicy cpuirqd
There are many different types of ksoftirqd problems:
* full unnicy cpu usage (user load, no disk a.)
* full cpu usage as nice load (no disk a.)
* half cpu usage on dual core systems (no disk a.)
* massive disk access issues
Created attachment 324847 [details]
2 proc interrupts evolutions during 8min
Attaching /proc/interrupts chart measured each minute during approx 8 minutes.
col1 and col2 are 1st and 2nd columns from /proc/interrupts - but I had to call them rather cpu0 and cpu1.
Case OK is the normal state of PC, case BAD is when ksoftirqd takes 100% of one cpu.
This was taken on opensuse 11.2 RC1 with kernel: Linux linux-7bt6 2.6.31.3-1-desktop #1 SMP PREEMPT 2009-10-08 00:27:25 +0200 i686 i686 i386 GNU/Linux
You can clearly see that LOC(cpu0), LOC(cpu1), 0(cpu0) go crazy during ksoftirqd
madness.
I see there is new kernel update to 2.6.31.5. I'll look if it still occurs there.
Created attachment 328648 [details]
three 20x snapshots a 0.5s
Horrible, it is just horrible. Instead of being resolved after two full releases this problem has worsened. - and openSUSE 11.2 has failed. I will have to look for another distro or cease to use Linux. Things are just inacceptible as they are now. It happens very often although such a thing is supposed not to ever happen at all. ... and no one seems to feel responsible for it.
P5-None is a provocation; it needs to be P1. Accept it - or never see me again at openSUSE! Why can`t we simply drop ksoftirqd and let all softirqds unhandled? Sorry for the harsh critics. There have been some quality issues with openSUSE 11.2 which I have not addressed in time (although this is not the right place to complain.). The problem is not openSUSE specific as it occurs on Debian, Ubuntu, RedHat/Fedora and Mandriva as well. Nonetheless it would be really great if you could do anything about it!! Kernel downgrading does not seem to be an option since kernel 2.6.25 (no ksoftirqd problem) seems to inhibit some powersave scripts with a root cause at me. Isn`t it simply possible to diff all changes from kernel 2.6.25 to 2.6.27? This is a lot of work sure, but it needs to be done as the problem has not been resolved by time. The ksoftirqd-hangs may often be bearable on a dual core machine though it sometimes causes that massive disk access that a hard reset is the only escape. Perhaps kernel developers of other distros could help us. It should be possible by a conjoint effort. Please stop touching the priorities. At least look up what the priorities MEAN before setting them. Realistically, we are not going to go through every single change between 2.6.25 and 2.6.27. There are over *21000* of them and the fact remains that we are still unable to reproduce it. Since you seem to be able to reproduce it reliably, we can show you how to bisect the vanilla kernel down to the exact change that causes the issue. This is really the only way we're going to be able to track this down. The good news is that you should only need to test it a maximum of 15 times. The problem is that I will most likely only be able to tell you the lowest version that still shows the ksoftirqd problem but not the highest that does not. It occurs very irregularely; once multiple times a day; and sometimes not a single time in a whole month (perhaps depending on the kernel version). Isn`t there really any possiblity to find out about changes that could most likely affect ksoftirqd; i.e. that take place in certain modules or refer to certain variables/ call procedures of a certain module? If not could you please provide me with ready-to-install kernel subversion-builds via the buildservice? I will at least try to run the posted partgui compilation that has caused the ksoftirqd-problem at me. I would personally like it best to run the kernels from an USB-stick (never know; the posted compilation could contain a backdoor as I am not sure whether my system has had been cracked that time; should perhaps have posted here.). Do you have any link for booting from USB-sticks for me? ... if the posted compilation should not do its job (well for a certain kernel and system setup it certainly did.) can you imagine anything that could trigger the ksoftirqd-problem? We could try different things out; perhaps we can find another program that can trigger it. That would ease tracing down the problem considerably. Any ideas? We should ask multiple developers! Please provide me with the respective kernel versions so that I can start testing!! The situation is intolerable as it is now. My whole system always slows down that much that I can not work with it.- and I can not reboot all the time either. The important bit is "system setup." I haven't been able to reproduce it and I'm not going to try to reproduce your setup. If you think your system has been compromised, it's a good idea to reinstall it anyway. Since there have been other reports of this, I don't expect that's the root cause. That said, openSUSE is community supported with best-effort support from Novell engineers. I understand that your problem is making it difficult for you to get work done but if you look at how many other open kernel bugs there are (for openSUSE and other community distros), you'll understand why I don't have time to create a bisect tree for you. The problem has been observed across distributions and you've narrowed it down to differences between two releases. I can give you an RPM containing 2.6.26-vanilla, but beyond that you're going to need to build test kernels yourself. Well, I am already working on a fresh installation (which could already be compromised again; you never know.). If I had links to the source tar.bz2s for all versions to test I could simply copy the kernel package with osc and exchange the .tar.bz2 on my own to get the respective versions built. So here I am back. New 2.6.31.5 suse kernel does hand also in ksoftirqd. I downloaded vanilla kernel and took the .config from /usr/src/linux-2.6.31.5-0.1-obj/i386/desktop/.config Recompiled and booted. The result is very good: After 20 days not a single ksoftirqd problem. I looked at the 2 .config files and there are some few differences. Here is roughly what is activated in opensuse kernel (comparing to vanilla): CONFIG_SUSE_KERNEL=y CONFIG_SPLIT_PACKAGE=y CONFIG_NF_CONNTRACK_SLP=m CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_CIPHER_TWOFISH=m CONFIG_DM_RAID=m CONFIG_DM_RAID45=m CONFIG_TOUCHSCREEN_ELOUSB=m CONFIG_CRASHER=m CONFIG_BOOTSPLASH=y CONFIG_SND_HDA_PATCH_LOADER=y CONFIG_SND_HDA_CODEC_CIRRUS=y CONFIG_SAMSUNG_LAPTOP=m CONFIG_EXT3_DEFAULTS_TO_BARRIERS_ENABLED=y CONFIG_EXT3_FS_NFS4ACL=y CONFIG_REISERFS_DEFAULTS_TO_BARRIERS_ENABLED=y CONFIG_FS_NFS4ACL=y CONFIG_XFS_DMAPI=m CONFIG_DMAPI=m CONFIG_NOVFS=m CONFIG_UNWIND_INFO=y CONFIG_STACK_UNWIND=y CONFIG_KDB=y CONFIG_KDB_MODULES=m CONFIG_KDB_OFF=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_KDB_USB=y CONFIG_KDB_KDUMP=y CONFIG_SECURITY_DEFAULT="apparmor" CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_APPARMOR_NETWORK=y CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1 CONFIG_SECURITY_APPARMOR_DISABLE=y CONFIG_KVM_KMP=y What I found in my config: CONFIG_SCHED_OMIT_FRAME_POINTER=y No XEN is present in any config. I just copied .config file to vanilla kernel, started menuconfig and exited. Could it be some suse patches could cause the ksoftirqd problems? I'm a bit concerned about lots of new FS options and anything that contains DMA in the name. Also I see there is something new about SND_HDA (intel hda? - which I have). I think I'll try do compile directly suse kernel with the i386/desktop/.config and eventually try to disable all listed options above - and see what happens. regards. ... but if you look at how many other open kernel bugs there are ... The ksoftirqd-problem is clearely the worst, most annoying and in the meanwhile the most oftenly appearing bug of all. It applies to all Linux users and should therefore have precedence over other minor issues. Please do push a resolution forward! Please provide me with the respective kernel rpms or tell me how to create them (at best with the buildservice). Where to download the sources and suse-patches? It is really a shame that kernel developers are simply unwilling to care about this problem! The fact that it is hard to reproduce is no excuse. Better with 2.6.32.1. However merely time can show whether the problem has gone completely. Perhaps we should mark as resolved and re-open as soon as it is discovered again. Wanna mark as resolved since it has not occured for a while now. However please do have a look at another nasty property of current kernels: Bug 566391, s2disk fails. using 2.6.32.3-0.0.15.68cba77-desktop in the meanwhile. Well it is really embarassing, but this bug persist also in newest opensuse 11.3 wth kernel 2.6.34.7. My current uname -a: Linux linux-wew7.site 2.6.34.7-0.5-desktop #1 SMP PREEMPT 2010-10-25 08:40:12 +0200 i686 i686 i386 GNU/Linux Things have actually already improved for many users! The problem luckily didn`t plague me in the last time. Michael, what kind of ksoftirqd problem was there? CPU-usage only, or with massive disk access and a totally irresponsive system? 100% CPU-usage of both CPUs or only of one? Did the problem go by itself or was a reboot the only escape? How often and by what frequency did it occur so far? What kind of system are you using: hardware, modules - perhaps someone can tell us what to look at. Ouh; oops! The problem just hasn`t occurred at me because I was using the clocksource=jiffies boot option. However this isn`t ideal. Bumping product to 11.3 since it still exists. I'm tossing this one back into the open bug queue because it's not my area of expertise. Created attachment 409384 [details]
clcoksource=jiffies, 2.6.37-8.99.14-desktop, 10x a 1s + stacktraces
Help! Now not even clocksource=jiffies can help. I just got a 100% 2core CPU usage on a 2.6.37-8.99.14.138eeaa-desktop kernel. A short while after the snapshots (/proc/interrupts + stackdumps) were taken massive disk access followed.
** novelty ** The first time for the ksoftirqd 100% cpu usage problem several stack dumps were taken (by Alt-PrnScr-L) to let you see in which execution state the CPU was. So just have a look at this.
Can you please try the Kernel of the Day? http://en.opensuse.org/openSUSE:Kernel_of_the_day If it still happens we should report it upstream so that it can get upstream attention. Also can you attach the output from `hwinfo --all` to this bug? Thanks, Brandon Created attachment 415179 [details] tasklet debug patch (In reply to comment #54) > ** novelty ** The first time for the ksoftirqd 100% cpu usage problem several > stack dumps were taken (by Alt-PrnScr-L) to let you see in which execution > state the CPU was. So just have a look at this. The 2.6.37 traces are useless. The 2.6.34.7 ones are helpful though. Also /proc/softirq clearly shows that some kind of shit schedules a tasklet way too often. I'm attaching a patch to track that down. Also I'm building a kernel to test and it will appear at: http://labs.suse.cz/jslaby/bug-465039 Watch for tasklet_action in the logs when this happens. Maybe there will be false positives. Then I'll increase the limit. Let's see. Created attachment 415180 [details]
tasklet debug patch
s/time_after/time_before/ indeed. Rebuilding.
Michal- Can you please test Jiri's Kernel? (In reply to comment #58) > Michal- Can you please test Jiri's Kernel? Or maybe Elmar? Well, this is nowadays increasingly hard to test. I may run the patched kernel for three month without actually being able to tell whether the ksoftirqd bug has vanished because it occurs so scaresly and inordinately. Unfortunately I have currently been away and thus was not able to test. What we need is something that can trigger the ksoftirqd bug. Michael, could you try to run partgui as provided by attachement 5 "erroneous partgui that kann trigger the ksoftirqd bug ". Then let us see if we still can trigger it. (In reply to comment #59) > (In reply to comment #58) > > Michal- Can you please test Jiri's Kernel? > > Or maybe Elmar? I'll try to find some time for it. Even for me it was hard to reproduce. But it occured for me at least once on suse 11.3. From my observations, it can happen under higher and long-lasting network load. More precisely it happened when I left running Azureus for several hours (with dektop locked), but it also happened when I was working on computer. Closing due to lack of response. If this is still an issue, please reopen with the requested information. |
After a while of using my notebook the ksoftirqd begins to use all my remaining cpu time. The only way to return to a normal state is by reboot. But also the reboot fails (it hangs somewhere, I cannot see the console. I'm using nvidia's video driver and have black console as soon as X starts - I can eventually set the nv or vesa driver in xorg.conf and then see). The worst part is it hangs during shutdown before unmounting HDDs. alt+ctrl+del nor sysrq does not work so I need to hard boot (power off). (Therefore I set this bug as critical, otherwise it can be major) I have no idea what causes it. Maybe network? Time in /var/log/messages shows only dhcpd doing its stuff + knetworkmanager somewhat appears on the top list. It hangs on both wlan or lan I searched for a while. Some guy have similar problem with a Clevo notebook with very similar specs. I had no such problem with previous suse (11.0) Any ideas? My ideas for now can be to try these : - use for a while only the vesa/nv driver - install vanilla kernel - download & install - disable dhcp daemon (easiest I think) Here is top of the top processes: top - 22:35:21 up 1 day, 4:05, 7 users, load average: 1.66, 1.76, 1.52 Tasks: 142 total, 3 running, 139 sleeping, 0 stopped, 0 zombie Cpu0 : 2.0%us, 6.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 92.0%si, 0.0%st Cpu1 : 4.0%us, 0.0%sy, 0.3%ni, 95.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4088564k total, 3940616k used, 147948k free, 66132k buffers Swap: 4409800k total, 28k used, 4409772k free, 3204660k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4 root 15 -5 0 0 0 R 99 0.0 24:24.67 ksoftirqd/0 3049 root 20 0 360m 69m 9772 S 4 1.7 31:08.22 X 9720 micho 39 19 128m 47m 14m S 4 1.2 14:14.83 operapluginwrap 9628 micho 20 0 511m 382m 17m S 2 9.6 12:52.75 opera 10710 micho 20 0 156m 56m 29m S 2 1.4 11:58.14 amarokapp 3853 micho 20 0 35884 13m 8840 S 1 0.3 2:55.25 knetworkmanager 8838 root 15 -5 0 0 0 S 1 0.0 0:00.96 events/1 3984 micho 20 0 37716 16m 10m R 0 0.4 1:51.73 konsole 11588 root 20 0 99112 57m 24m S 0 1.4 0:51.00 y2base 1 root 20 0 1008 356 308 S 0 0.0 0:01.20 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.14 migration/0 7 root 15 -5 0 0 0 S 0 0.0 0:05.14 events/0 /var/log/messages around the fatal time (I think it occured somewhere after the "MARK"): Jan 9 21:46:30 linux-6vsc dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67 Jan 9 21:46:31 linux-6vsc dhclient: DHCPACK from 192.168.1.1 Jan 9 21:46:31 linux-6vsc dhclient: bound to 192.168.1.3 -- renewal in 1625 seconds. Jan 9 22:06:31 linux-6vsc -- MARK -- Jan 9 22:13:36 linux-6vsc dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67 Jan 9 22:13:36 linux-6vsc dhclient: DHCPACK from 192.168.1.1 Jan 9 22:13:36 linux-6vsc dhclient: bound to 192.168.1.3 -- renewal in 1726 seconds. My system: Linux linux-6vsc 2.6.27.7-9-pae #1 SMP 2008-12-04 18:10:04 +0100 i686 i686 i386 GNU/Linux I used kde4. Now I have kde3. sys_vendor = "CLEVO CO." sys_product = "M570TU" Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz 4GB ddr3 ram nvidia 9800gt Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02) Hope somebody will help. I have no idea what ksoftirqd if for. Thanks in advance.