Bug 809473

Summary: Laptop sucking power like crazy - HW: Specific Intel graphics card do not enter RC6
Product: [openSUSE] openSUSE 12.3 Reporter: Klaus Kämpf <kkaempf>
Component: KernelAssignee: Thomas Renninger <trenn>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: jeffm, jloeser, jnelson-suse, kolAflash
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: openSUSE 12.3   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Klaus Kämpf 2013-03-14 20:15:05 UTC
Did an upgrade to 12.3 today. Now the Lenovo x220 laptop is sucking power like crazy, powertop shows:

Summary: 451.9 wakeups/second,  26.9 GPU ops/seconds, 0.0 VFS ops/sec and 4.5% CPU use

Power est.              Usage       Events/s    Category       Description
  18.0 W     3880 rpm                   Device         Laptop fan
  6.23 W     80.0%                      Device         Display backlight
  685 mW      6.8 ms/s     111.4        Process        /usr/bin/gnome-shell
  662 mW     21.3 ms/s      92.3        Process        /usr/lib64/chromium/chromium --password-store=gnome
  489 mW      7.1 ms/s      65.3        Process        /usr/lib64/chromium/chromium --type=renderer --lang=en-US --force-fieldtrials=ForceCompositingMode/disable/Infini
  403 mW    592.8 µs/s      53.9        Interrupt      [43] i915
  221 mW    255.6 µs/s      29.6        Interrupt      PS/2 Touchpad / Keyboard / Mouse
  183 mW    399.5 µs/s      24.4        Interrupt      [6] tasklet(softirq)
....
Comment 1 Klaus Kämpf 2013-03-14 20:16:35 UTC
The CPU/Fan area on the bottom side is HOT. The power supply unit is even HOTTER, I can hardly touch it.

None of this was a problem with the 12.2 kernel !
Comment 2 Klaus Kämpf 2013-03-14 20:17:40 UTC
Powertop output when running on battery:

The battery reports a discharge rate of 26.2 W
The estimated remaining time is 3 hours, 2 minutes

Summary: 406.7 wakeups/second,  36.6 GPU ops/seconds, 0.0 VFS ops/sec and 5.8% CPU use

Power est.              Usage       Events/s    Category       Description
  18.1 W     3882 rpm                   Device         Laptop fan
  6.21 W     80.0%                      Device         Display backlight
  458 mW    461.6 µs/s      60.9        Interrupt      PS/2 Touchpad / Keyboard / Mouse
  457 mW      5.5 ms/s      74.6        Process        /usr/lib64/chromium/chromium --password-store=gnome
  425 mW    655.4 µs/s      56.4        Interrupt      [43] i915
  387 mW     18.1 ms/s      64.5        Process        /usr/bin/gnome-shell
  283 mW     11.0 ms/s      37.5        Process        /usr/lib64/chromium/chromium --type=renderer --lang=en-US --force-fieldtrials=ForceCompositingMode/disable/Infini
  170 mW      1.2 ms/s      22.6        Process        /usr/lib64/chromium/chromium --type=renderer --lang=en-US --force-fieldtrials=ForceCompositingMode/disable/Infini
  115 mW      0.8 ms/s      15.2        Interrupt      [6] tasklet(softirq)
 62.4 mW     58.9 µs/s       8.3        kWork          ieee80211_iface_work
Comment 3 Klaus Kämpf 2013-03-14 20:18:26 UTC
(In reply to comment #2)
> Powertop output when running on battery:
> 
> The battery reports a discharge rate of 26.2 W

Power usage with 12.2 kernel was in the 6 to 9 W range.
Comment 4 Jon Nelson 2013-03-15 15:26:28 UTC
I can report something very similar. Thinkpad T520.

The battery reports a discharge rate of 31.3 W
The estimated remaining time is 1 hours, 46 minutes

Summary: 901.5 wakeups/second,  1.5 GPU ops/seconds, 0.0 VFS ops/sec and 5.0% CPU use


I used to get 10-13W on openSUSE 12.2. Now I'm seeing 25-30W.

Thinking it might be helped by a BIOS upgrade, I tried that. No change.
Here is the contents of my /etc/modprobe.d/99-local.conf file:

blacklist mei
blacklist firewire_ohci
# power_save=60 means after being idle for 60s go into power savings, and
#  when in that power saving mode, also reset the controller
options snd_hda_intel enable_msi=1 power_save=60 power_save_controller=1 
options iwlwifi power_save=1
options e1000e SmartPowerDownEnable=1




Where can I look for more info?
Comment 5 Jon Nelson 2013-03-15 15:57:39 UTC
Booting with:

acpi_osi=Linux

drops things to:

The battery reports a discharge rate of 18.2 W
The estimated remaining time is 2 hours, 47 minutes

Summary: 427.7 wakeups/second,  1.1 GPU ops/seconds, 0.0 VFS ops/sec and 7.3% CPU use


(I should note that this and the previous measurements were taken with kwin effects turned off).
Comment 6 Jon Nelson 2013-03-15 16:00:19 UTC
Possibly relevant: bug 801341 -- my laptop says "RC6 issue" when I run the script.
Comment 7 Klaus Kämpf 2013-03-17 20:07:40 UTC
FWIW, here my /proc/cmdline

BOOT_IMAGE=dev000:\efi\SuSE\vmlinuz-3.7.10-1.1-default root=/dev/disk/by-id/ata-ST320LT007-9ZV142_W0Q2ZTWC-part6  resume=/dev/disk/by-id/ata-ST320LT007-9ZV142_W0Q2ZTWC-part5 splash=silent quiet  showopts pcie_aspm=powersave i915.i915_enable_rc6=7 i915.i915_enable_fbc=1 i915.lvds_downclock=1


powertop shows laptop fan at 20 WATTS !

The battery reports a discharge rate of 24.5 W
The estimated remaining time is 1 hours, 28 minutes

Summary: 402.0 wakeups/second,  79.7 GPU ops/seconds, 0.0 VFS ops/sec and 3.0% CPU use

Power est.              Usage       Events/s    Category       Description
  18.3 W     3868 rpm                   Device         Laptop fan
  6.04 W     80.0%                      Device         Display backlight


top shows cpus idle

%Cpu(s):  2.9 us,  0.7 sy,  0.0 ni, 96.1 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
Comment 8 kolA flash 2013-03-19 10:20:29 UTC
To see if rc6 is still working after resume from s2ram (standby) check the "Idle stats" page of
sudo /usr/sbin/powertop
Look at section "GPU". If "Powered On" is at 100% rc6 probably isn't enabled.

A workaround that helped on my notebook Thinkpad x220 (Intel i7-2620M, Sandy Bridge): Use the kernel from here
http://download.opensuse.org/repositories/Kernel:/stable/standard/
I installed: kernel-desktop-3.8.3-1.1.x86_64.rpm 16-Mar-2013 12:15
Comment 9 Klaus Kämpf 2013-03-19 11:10:20 UTC
Ah, thanks, that's helpful. rc6 is indeed disabled :-/
Comment 10 Jon Nelson 2013-03-21 14:18:06 UTC
I'm trying 3.8.2 from factory.
Without firefox running (firefox + flash is a big CPU host) or desktop effects enabled:

The battery reports a discharge rate of 12.6 W
The estimated remaining time is 4 hours, 13 minutes

Summary: 298.9 wakeups/second,  1.2 GPU ops/seconds, 0.0 VFS ops/sec and 8.1% CPU use

and from "GPU":

                   |             GPU     |
                    |                     |
                    | Powered On 37.3%    |
                    | RC6        62.7%    |
                    | RC6p        0.0%    |
                    | RC6pp       0.0%    |

The overwhelming cost on my laptop now is the backlight.
Comment 11 Klaus Kämpf 2013-03-27 08:54:13 UTC
After installing kernel-default-3.8.3-1.1.x86_64 from Kernel:stable the problem is *NOT* fully solved. 
I still get situations where the GPU is on 100% load, with all the RC6 states at 0%
Comment 12 Klaus Kämpf 2013-04-03 12:06:17 UTC
Updated to 3.8.5-1-default, GPU at 100% :-(
Comment 13 Jon Nelson 2013-06-23 10:41:13 UTC
Still a big problem for me. 3.9.4 was good, 3.9.7 is bad for sure.
For me, it's the rc6 issue.
Comment 14 Klaus Kämpf 2013-07-04 13:18:16 UTC
Wow, just updated to 3.10 and battery went from 100% to 0% within 2hrs. That's bad.
Comment 15 kolA flash 2013-07-04 14:01:58 UTC
(In reply to comment #14)
> Wow, just updated to 3.10 and battery went from 100% to 0% within 2hrs. That's
> bad.

Same problem. Used this kernel:
http://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/kernel-default-3.10.0-1.1.g3dcd746.x86_64.rpm

BUT: The reason for this was that my cpu was running at full speed ( 2.70GHz ) all the time. It wasn't slowing down to any lower level. I think it didn't had something to do with the gpu (checked gpu sleep-level using powertop).

cpufreq-governor was set to "powersave". Alternatively just "performance" seemed to be available since some kernel versions on my Intel Core i7-2620M (already in kernel 3.9.8 and maybe before). No more "conservative" or "ondemand".
https://wiki.archlinux.org/index.php/CPU_Frequency_Scaling

Maybe this is related:
http://www.golem.de/news/linux-kernel-p-states-verringern-leistungsaufnahme-auf-intel-cpus-1305-99336.html


So this seems to be a DIFFERENT bug for kernel 3.10
Comment 16 Thomas Renninger 2013-07-16 09:49:47 UTC
Sigh, I expect this is about several issues:

- I expect in 12.3 we have an Intel graphics RC6 issue, but I did not have time
  to look at this one, I am also not that familiar with Intel graphics driver.
(a guess, but this is what I would look out for in 12.3)

> So this seems to be a DIFFERENT bug for kernel 3.10
There I saw a very critical bug: One CPU was polling instead of entering sleep states. You can verify with:
cpupower monitor
when idle.
One CPU never enters deeper sleep state, but gets woken up really often (double check with interrupt count via powertop or watch -n1 cat /proc/interrupts).

Also for specific recent Intel CPUs (Model: 2a, 2d, 3a, compare with /proc/cpuinfo,
cpu family      : 6
model           : 42, 45, 58
The new Intel pstate driver is used.
This is why one reports:
> BUT: The reason for this was that my cpu was running at full speed ( 2.70GHz )
But in fact this may not affect power consumption on a recent CPU. Look out whether it enters deepest sleep/idle states using these tools:
cpupower monitor
powertop
turbostat
Comment 17 Jon Nelson 2013-07-16 13:27:36 UTC
cpupower monitor w/3.10.1-1.g19a2fe9-desktop:


    |Nehalem                    || SandyBridge        || Mperf              || Idle_Stats                              
CPU | C3   | C6   | PC3  | PC6  || C7   | PC2  | PC7  || C0   | Cx   | Freq || POLL | C1-S | C1E- | C3-S | C6-S | C7-S 
   0|  0.00|  0.00|  0.00|  0.00||  0.00|  0.00|  0.00|| 99.69|  0.31|  3379|| 98.81|  0.00|  0.00|  0.00|  0.00|  0.00
   1|  0.00|  0.00|  0.00|  0.00||  0.00|  0.00|  0.00||  0.77| 99.23|  3315||  0.00|  0.00|  0.00|  0.00|  0.00| 99.17
   2|  0.28|  0.10|  0.00|  0.00|| 96.11|  0.00|  0.00||  1.34| 98.66|  3188||  0.00|  0.00|  0.21|  0.17|  0.10| 98.17
   3|  0.28|  0.10|  0.00|  0.00|| 96.11|  0.00|  0.00||  1.30| 98.70|  3188||  0.00|  0.13|  0.07|  0.13|  0.00| 98.38



/proc/interrupts  6 minutes after booting:

           CPU0       CPU1       CPU2       CPU3       
  0:         24          0          0          0  IR-IO-APIC-edge      timer
  1:         12          0          0          0  IR-IO-APIC-edge      i8042
  8:          1          0          0          0  IR-IO-APIC-edge      rtc0
  9:       6333          0          0          0  IR-IO-APIC-fasteoi   acpi
 12:       1962          0          0          0  IR-IO-APIC-edge      i8042
 16:       3170          0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb1, mmc0
 18:          0          0          0          0  IR-IO-APIC-fasteoi   i801_smbus
 23:       6470          0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb2
 40:          0          0          0          0  DMAR_MSI-edge      dmar0
 41:          0          0          0          0  DMAR_MSI-edge      dmar1
 42:         22          0          0          0  IR-PCI-MSI-edge      mei_me
 43:     101131          0          0          0  IR-PCI-MSI-edge      ahci
 44:      33722          0          0          0  IR-PCI-MSI-edge      i915
 45:      30785          0          0          0  IR-PCI-MSI-edge      eth0
 46:       1104          0          0          0  IR-PCI-MSI-edge      iwlwifi
 47:       1534          0          0          0  IR-PCI-MSI-edge      snd_hda_intel
NMI:        609         60         46         47   Non-maskable interrupts
LOC:     389427      50298      54103      44780   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:        609         60         46         47   Performance monitoring interrupts
IWI:       1130       1773       1645       1792   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:     104272     121121     150597     160098   Rescheduling interrupts
CAL:       1074       1829       1764       1945   Function call interrupts
TLB:       1208        495        655        798   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          3          3          3          3   Machine check polls
ERR:          0
MIS:          0
Comment 18 Thomas Renninger 2013-07-16 17:19:30 UTC
The first CPU shows the problem:
C0   | Cx   | Freq ||
99.69|  0.31|  3379||

POLL | C1-S | C1E- | C3-S | C6-S | C7-S
98.81|  0.00|  0.00|  0.00|  0.00|  0.00

There are so many interrupts happening on this CPU that the CPU does not even enter C1, but is kept in a polling loop for very low latency.
I asked mainline maintainers already, but nobody had an idea for a concrete modification or could not reproduce and I did not have time to bisect this down yet.
Afaik this came in with at least 3.9 already.
Looks like a timer programming issue, probably only on specific HW.

Stay tuned and ping me at the end of the week if I did not answer until then.
Comment 19 Thomas Renninger 2013-07-23 12:15:27 UTC
For info:
About the graphics RC6 mode not entered:
Jan (Loeser) also has this issue and 3.9.3 did not work for him.
Latest 3.10.2 kernel does work and RC6 got entered. Which makes the CPU Package state (PC7 and others) enter and you see how the temperature is lowered consistently.

He is not affected by the timer interrupt problem mentioned above, so he is fine with this kernel.
Ah yes, and things still work, also after suspend.
Comment 20 Jon Nelson 2013-07-23 13:18:57 UTC
I'm now on  3.7.10-44.g57b6816-desktop  and - so far - everything is working very well, even after multiple suspend-resume cycles.

Current temperature is 113F.

Bug 801341 is also fixed by this kernel for me.
Comment 21 Jan Loeser 2013-07-24 10:22:52 UTC
(In reply to comment #19)
> For info:
> About the graphics RC6 mode not entered:
> Jan (Loeser) also has this issue and 3.9.3 did not work for him.
> Latest 3.10.2 kernel does work and RC6 got entered. Which makes the CPU Package
> state (PC7 and others) enter and you see how the temperature is lowered
> consistently.
> 
> He is not affected by the timer interrupt problem mentioned above, so he is
> fine with this kernel.
> Ah yes, and things still work, also after suspend.

After last resume (with kernel 3.10.2), still the same problem. RC6 is _not_entered and temperature is at high peak.
Comment 22 Thomas Renninger 2014-11-11 16:17:28 UTC
I close this one won't fix for 12.3.
If this is still a problem, please open a new, short bug with Egbert Eich and Takashi in CC. This is something for the old HW enablement team...