Bug 334170

Summary: Too many Wakeups-from-idle per second after suspend2ram
Product: [openSUSE] openSUSE 10.3 Reporter: Péter Kerékfy <kerekfyp>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Enhancement    
Priority: P5 - None CC: rjwysocki, suse, trenn
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmesg including boot and s2ram
dmesg (2.6.23.1-20071013190126-vanilla)
powertop before s2ram (2.6.23.1-20071013190126-vanilla)
powertop after s2ram (2.6.23.1-20071013190126-vanilla)

Description Péter Kerékfy 2007-10-16 07:25:13 UTC
After resume from suspend to ram the cpu wakes up several thousand times per second according to powertop 1.8.

(After a clean reboot the number of wakeups per second is about 200.)

# uname -a
Linux kpeter 2.6.22.9-0.4-default #1 SMP 2007/10/05 21:32:04 UTC i686 i686 i386 GNU/Linux


powertop output:
================
Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        (15,6%)         2,00 Ghz    21,7%
C1                0,0ms ( 0,0%)         1,67 Ghz     0,0%
C2                0,4ms (84,4%)         1333 Mhz    11,7%
                                        1000 Mhz    66,6%


Wakeups-from-idle per second : 2274,7   interval: 10,0s
no ACPI power usage estimate available

Top causes for wakeups:
  83,0% (2346,9)       <interrupt> : extra timer interrupt
   5,4% (151,5)       <interrupt> : HDA Intel, i915@pci:0000:00:02.0
   3,5% (100,1)       <interrupt> : uhci_hcd:usb1, ehci_hcd:usb5
   1,9% ( 53,5)         amarokapp : schedule_timeout (process_timeout)
   1,3% ( 36,2)              Xorg : do_setitimer (it_real_fn)
   0,8% ( 22,7)            compiz : schedule_timeout (process_timeout)
   0,4% ( 12,5)                sh : queue_delayed_work_on (delayed_work_timer_fn
   0,4% ( 11,9)   <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
   0,4% ( 10,0)       <interrupt> : uhci_hcd:usb3, tifm_7xx1, yenta
   0,3% (  9,7)       <interrupt> : iwl3945
   0,2% (  6,5)        postmaster : schedule_timeout (process_timeout)
   0,2% (  6,4)       <interrupt> : acpi
   0,2% (  6,2)            kicker : futex_wait (hrtimer_wakeup)
   0,2% (  5,1)   thunderbird-bin : futex_wait (hrtimer_wakeup)
   0,2% (  4,3)            kicker : schedule_timeout (process_timeout)
   0,1% (  4,0)       <interrupt> : PS/2 keyboard/mouse/touchpad
   0,1% (  4,0)       soffice.bin : schedule_timeout (process_timeout)
   0,1% (  3,6)            oracle : schedule_timeout (process_timeout)
   0,1% (  3,0)       <interrupt> : libata
   0,1% (  2,5)       <interrupt> : ahci
   0,1% (  2,4)           emerald : schedule_timeout (process_timeout)
   0,1% (  2,3)           konsole : schedule_timeout (process_timeout)
   0,1% (  1,9)               top : schedule_timeout (process_timeout)
   0,1% (  1,7)              Xorg : schedule_timeout (process_timeout)
   0,1% (  1,5)             s2ram : queue_delayed_work_on (delayed_work_timer_fn
   0,1% (  1,5)     <kernel core> : queue_delayed_work_on (delayed_work_timer_fn
   0,0% (  1,4)       firefox-bin : futex_wait (hrtimer_wakeup)

top output:
===========
top - 09:18:22 up 48 min,  4 users,  load average: 0.34, 0.52, 0.78
Tasks: 180 total,   1 running, 179 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.6%sy,  0.0%ni, 98.8%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   2068028k total,  1283460k used,   784568k free,    31800k buffers
Swap:  2104472k total,        0k used,  2104472k free,   766108k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2815 root      15   0  426m  79m 5724 S    2  3.9   2:15.79 Xorg
 7729 kerekfyp  18   0  180m  82m  20m S    0  4.1   0:57.18 firefox-bin
 4889 kerekfyp  15   0 35696  21m  12m S    8  1.0   0:41.63 compiz
 4939 kerekfyp  15   0  124m  40m  24m S    0  2.0   0:39.90 amarokapp
 6473 kerekfyp  15   0  2328 1044  772 R    4  0.1   0:34.37 top
 4900 kerekfyp  15   0  104m  19m  14m S    4  1.0   0:25.24 kicker
 6403 kerekfyp  20   0  168m  46m  20m S    0  2.3   0:17.49 thunderbird-bin
 4894 kerekfyp  15   0 17884 9700 7080 S    0  0.5   0:05.48 emerald
 6472 kerekfyp  15   0 32208  16m  13m S    2  0.8   0:04.94 konsole
 5036 kerekfyp  15   0  180m  39m  30m S    0  2.0   0:03.44 soffice.bin
 6939 root      10  -5     0    0    0 S    0  0.0   0:03.42 events/1
 4936 kerekfyp  15   0 69080  21m  17m S    0  1.1   0:02.94 kopete
 3673 postgres  15   0 46920 1212  720 S    0  0.1   0:02.93 postmaster
 2635 haldaemo  15   0  5624 3912 3176 S    0  0.2   0:02.77 hald
 2577 messageb  15   0  2328  992  660 S    0  0.0   0:02.56 dbus-daemon
  463 root      10  -5     0    0    0 S    0  0.0   0:01.88 kondemand/0
 2968 root      15   0 28908 2204 1828 S    0  0.1   0:01.55 NetworkManager
 4960 kerekfyp  15   0 33224  13m  10m S    0  0.7   0:01.50 knetworkmanager
 7538 kerekfyp  15   0 33316  14m  11m S    0  0.7   0:01.49 konsole
 4880 kerekfyp  15   0 36708  16m  12m S    0  0.8   0:01.46 kded
    1 root      15   0   744  288  240 S    0  0.0   0:01.40 init
 3920 root      15   0 93596 8580 5316 S    0  0.4   0:01.39 httpd2-prefork
 4898 kerekfyp  15   0 33928  13m  10m S    0  0.7   0:01.38 kdesktop
 4953 kerekfyp  15   0 48520  12m 9268 S    0  0.6   0:01.31 knotify
 3656 oracle    15   0  659m  13m  12m S    0  0.7   0:01.14 oracle
 6932 root      34  19     0    0    0 S    2  0.0   0:01.07 ksoftirqd/1
 3654 oracle    15   0  661m  41m  38m S    0  2.1   0:00.97 oracle
 3527 root      15   0  6024 1956 1420 S    0  0.1   0:00.94 cupsd
   32 root      10  -5     0    0    0 S    0  0.0   0:00.93 kacpi_notify
 2926 root      15   0  2644 1032  836 S    0  0.0   0:00.93 dhcdbd
 4976 kerekfyp  15   0 29300  13m  11m S    0  0.7   0:00.88 opensuseupdater
 3410 root      15   0  139m 1088  792 S    0  0.1   0:00.87 nscd
 3526 root      15   0  4376 1696 1440 S    0  0.1   0:00.74 powersaved
 2578 root      17   0  2124  696  448 S    2  0.0   0:00.72 syslog-ng
 4885 kerekfyp  15   0  1584  368  300 S    0  0.0   0:00.68 kwrapper
 4929 kerekfyp  15   0 34372  18m  13m S    0  0.9   0:00.59 oooqs
 4828 kerekfyp  15   0  3544  480  300 S    0  0.0   0:00.53 gpg-agent
Comment 1 Timo Hoenig 2007-10-16 09:12:07 UTC
Péter, can you please provide the output of powertop before suspending the machine (e.g. when you only see 200 w/s).

Thanks! 
Comment 2 Péter Kerékfy 2007-10-16 09:46:31 UTC
After reboot and before any suspend powertop prints:

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 7,2%)         2,00 Ghz     0,6%
C1                0,0ms ( 0,0%)         1,67 Ghz     0,0%
C2                1,4ms (92,8%)         1333 Mhz     0,0%
                                        1000 Mhz    99,4%


Wakeups-from-idle per second : 653,5    interval: 10,0s
no ACPI power usage estimate available

Top causes for wakeups:
  35,1% (242,4)       <interrupt> : extra timer interrupt
  18,5% (127,6)       <interrupt> : HDA Intel, i915@pci:0000:00:02.0
  14,5% (100,0)       <interrupt> : uhci_hcd:usb1, ehci_hcd:usb5
  10,5% ( 72,8)         amarokapp : schedule_timeout (process_timeout)
   4,2% ( 29,2)              Xorg : do_setitimer (it_real_fn)
   3,3% ( 22,9)            compiz : schedule_timeout (process_timeout)
   1,8% ( 12,5)   <kernel module> : queue_delayed_work_on (delayed_work_timer_f
   1,7% ( 11,7)   <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
   1,6% ( 11,0)       <interrupt> : libata
   0,9% (  6,5)        postmaster : schedule_timeout (process_timeout)
   0,9% (  6,4)       <interrupt> : iwl3945
   0,9% (  6,2)            kicker : futex_wait (hrtimer_wakeup)
   0,9% (  6,0)       <interrupt> : acpi
   0,7% (  4,8)       soffice.bin : schedule_timeout (process_timeout)
   0,6% (  3,9)            oracle : schedule_timeout (process_timeout)
   0,4% (  3,0)     <kernel core> : queue_delayed_work_on (delayed_work_timer_f
   0,4% (  2,5)           emerald : schedule_timeout (process_timeout)
   0,3% (  2,0)     <kernel core> : clocksource_register (clocksource_watchdog)
   0,3% (  1,9)            kicker : schedule_timeout (process_timeout)
   0,3% (  1,8)       <interrupt> : ahci
   0,2% (  1,3)              Xorg : schedule_timeout (process_timeout)
   0,2% (  1,1)            kopete : schedule_timeout (process_timeout)
   0,1% (  1,0)          kwrapper : do_nanosleep (hrtimer_wakeup)
   0,1% (  1,0)    NetworkManager : tg3_open (tg3_timer)
   0,1% (  1,0)    httpd2-prefork : schedule_timeout (process_timeout)
   0,1% (  1,0)              ntpd : do_setitimer (it_real_fn)
   0,1% (  1,0)          slmodemd : schedule_timeout (process_timeout)

(Note: as you probably see, this and the previous output was taken while running a full kde desktop environment with compiz.)

When no X application (except kdm) is running the number of extra timer interrupts decreases to ~30 before suspend and to ~300 after suspend.
Comment 3 Timo Hoenig 2007-10-16 10:48:38 UTC
Not sure what triggers the high amount of extra timer interrupts.  Maybe you can investigate by killing processes (one by one) and also unloading modules in order to narrow down the cause.
Comment 4 Thomas Renninger 2007-10-16 11:18:10 UTC
I don't think that has to do with processes.
I expect the timer set up got screwed somehow. In nohz mode the timer
interrupts are switched off or lowered. For some reason the system seems to fall back to normal HZ mode an timer interrupts are fired (not sure whether we are at 250 or 1000 timers per second -> it normally doesn't matter anymore..).

If you do a:
watch -n1 cat /proc/interrupts
you should see the timer interrupts increase regularly by 250 or 1000 per second after s2ram, but not before?

Maybe setting different clocksources help?
/sys/devices/system/clocksource/clocksource0/

However, this is rather complex code written by Ingo Molnar and Thomas Gleixner.
IMO this bug should get verified in latest kernel and reported upstream, e.g. via http://bugzilla.kernel.org
To verify whether the latest mainline kernel is also affected, you can take this one:
ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64/kernel-vanilla-2.6.23.1-20071013190126.x86_64.rpm
(the timestamp might vary of course).

Adding Pavel and Rafael, they may already know about this or might want to know about it.
Comment 5 Péter Kerékfy 2007-10-16 12:11:11 UTC
If I do 'watch -n1 cat /proc/interrupts' I see the same number of timer interrupts fired per second as with powertop. Before suspend it is about 300, after suspend it is about 3000.

Setting different clocksources makes no difference except that clocksource 'jiffies' locked up the system.

# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm jiffies tsc


What about the executeable called "s2ram"? Please take a look at the first powertop output (which was taken after the suspend). There is a line contaning the word "s2ram". In the second powertop output (which was taken before the suspend) no such line exists.
Comment 6 Thomas Renninger 2007-10-16 12:41:38 UTC
This has nothing to do with a process (maybe triggered by one indirectly, no idea), this is about a wrongly programmed timer after s2ram.
Hmm, you mean the s2ram executable is executed all the time after suspend? You can double check with ps aux |grep s2ram, this indeed sounds strange...

Can you attach dmesg, pls.
Best is you also try the latest kernel as said in comment #4.
Comment 7 Péter Kerékfy 2007-10-16 13:00:59 UTC
Created attachment 178782 [details]
dmesg including boot and s2ram
Comment 8 Péter Kerékfy 2007-10-16 13:02:01 UTC
No, the s2ram executeable is not executed according to 'ps aux | grep s2ram'.
I just wanted to point out this slight difference between the above two powertop outputs.

dmesg attached.

Anyway, I'll do a fresh install with latest vanilla kernel as soon as I can.
Comment 9 Thomas Renninger 2007-10-16 13:09:14 UTC
Maybe hpet=disable helps, AFAIK this one got some extra suspend/resume code and made some problems on suspend recently.
Comment 10 Péter Kerékfy 2007-10-16 19:34:28 UTC
Fresh 10.3 GM install also has this issue with both vanilla and default kernel.

hpet=disable kernel parameter does not change anything.

See the attached outputs.
Comment 11 Péter Kerékfy 2007-10-16 19:35:48 UTC
Created attachment 178840 [details]
dmesg (2.6.23.1-20071013190126-vanilla)
Comment 12 Péter Kerékfy 2007-10-16 19:37:51 UTC
Created attachment 178842 [details]
powertop before s2ram (2.6.23.1-20071013190126-vanilla)
Comment 13 Péter Kerékfy 2007-10-16 19:41:43 UTC
Created attachment 178843 [details]
powertop after s2ram (2.6.23.1-20071013190126-vanilla)

This output also contains that strange "s2ram" executeable which is in fact not being executed....
Comment 14 Pavel Machek 2007-10-17 08:21:46 UTC
Yes, powertop is confused, and it happens in mainline, too. Eating too much power is ugly, but given newness of nohz subsystem, I guess this should better be solved on mainline.
Comment 16 Pavel Machek 2007-10-24 10:05:21 UTC
Ok, lets keep the bug here so that people can see it was already reported...
Comment 17 Thomas Renninger 2007-10-27 23:51:42 UTC
There is another bug where this happens:
bug #335086
This one is even more sever as you don't even have to s2ram for it.
There processor.max_cstate=1 workarounds the problem.
Comment 18 Péter Kerékfy 2007-10-28 07:25:30 UTC
It seems that 'echo 1 > /sys/module/processor/parameters/max_cstate' workarounds this problem too. It normalizes the number of wakeups per second but also makes the cpu run always in C0.

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        (100,0%)        2,00 Ghz     0,0%
C1                0,0ms ( 0,0%)         1,67 Ghz     0,0%
C2                0,0ms ( 0,0%)         1333 Mhz     0,0%
                                        1000 Mhz   100,0%

Comment 19 Pavel Machek 2007-10-28 10:09:50 UTC
Disabling higher Cstates is "throw baby out with bathwater" situation. It will make your power consumption _big_. Anyway, it also hints that problem is with cpuidle framework. Adam?
Comment 20 Thomas Renninger 2007-10-28 12:33:39 UTC
We do not have cpuidle framework yet in 10.3 (We have them in -rt kernel only).
My suspicion is that this gets introduced with nohz/clockevents/highres patches.
Maybe it's the way the timer's are (re-)programmed. In this case maybe the way the resume functions of timer reprogramming for these patches..., just a guess.
But as all this is complicated stuff...
Peter: It will probably be some work, but I'd start to try a kernel shortly before nohz/clockevents/highres patches, not sure anymore, when exactly they came in, 2.6.21 should be ok.
If this one works, a git bisect should bring you to the offending patch.
If it's the above mentioned patches, open a bug on bugzilla.kernel.org and assign them to Ingo Molnar and/or Thomas Gleixner, those guys are to blame for these huge patches...
If it's something else and it worked before, we have a good chance to fix it if it's a smaller patch.
Comment 21 Thomas Renninger 2007-10-28 12:53:54 UTC
Powertop won't be available, you probably can use:
watch -n1 cat /proc/interrupts
to prove whether things are working or not.
Depending on the CONFIG_HZ value used (in SUSE 250 per default), the timer interrupt should increase by this value (250) per second.
If you get some thousands per second, something goes wrong (you can also try this with the current system).
If this also happens with older kernels, before nohz patches, we might have a general timer programming/C-state problem...

zcat /proc/config.gz |less
then search for CONFIG_HZ will show you the CONFIG_HZ value of your kernel (could also be 1000...).
Comment 22 Jan Ritzerfeld 2007-12-23 16:35:13 UTC
I added a solution for my problem with too many wake-ups as Comment #7 of Bug #335086.
Comment 24 Pavel Machek 2008-09-29 10:35:00 UTC
Ok, chance to fix this for 10.3 is less than 0; if it still happens with 11.0 / 11.1 alpha, please open a new bug.