|
Bugzilla – Full Text Bug Listing |
| Summary: | Laptop sucking power like crazy - HW: Specific Intel graphics card do not enter RC6 | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.3 | Reporter: | Klaus Kämpf <kkaempf> |
| Component: | Kernel | Assignee: | Thomas Renninger <trenn> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | jeffm, jloeser, jnelson-suse, kolAflash |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | openSUSE 12.3 | ||
| Whiteboard: | |||
| Found By: | Development | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Klaus Kämpf
2013-03-14 20:15:05 UTC
The CPU/Fan area on the bottom side is HOT. The power supply unit is even HOTTER, I can hardly touch it. None of this was a problem with the 12.2 kernel ! Powertop output when running on battery: The battery reports a discharge rate of 26.2 W The estimated remaining time is 3 hours, 2 minutes Summary: 406.7 wakeups/second, 36.6 GPU ops/seconds, 0.0 VFS ops/sec and 5.8% CPU use Power est. Usage Events/s Category Description 18.1 W 3882 rpm Device Laptop fan 6.21 W 80.0% Device Display backlight 458 mW 461.6 µs/s 60.9 Interrupt PS/2 Touchpad / Keyboard / Mouse 457 mW 5.5 ms/s 74.6 Process /usr/lib64/chromium/chromium --password-store=gnome 425 mW 655.4 µs/s 56.4 Interrupt [43] i915 387 mW 18.1 ms/s 64.5 Process /usr/bin/gnome-shell 283 mW 11.0 ms/s 37.5 Process /usr/lib64/chromium/chromium --type=renderer --lang=en-US --force-fieldtrials=ForceCompositingMode/disable/Infini 170 mW 1.2 ms/s 22.6 Process /usr/lib64/chromium/chromium --type=renderer --lang=en-US --force-fieldtrials=ForceCompositingMode/disable/Infini 115 mW 0.8 ms/s 15.2 Interrupt [6] tasklet(softirq) 62.4 mW 58.9 µs/s 8.3 kWork ieee80211_iface_work (In reply to comment #2) > Powertop output when running on battery: > > The battery reports a discharge rate of 26.2 W Power usage with 12.2 kernel was in the 6 to 9 W range. I can report something very similar. Thinkpad T520. The battery reports a discharge rate of 31.3 W The estimated remaining time is 1 hours, 46 minutes Summary: 901.5 wakeups/second, 1.5 GPU ops/seconds, 0.0 VFS ops/sec and 5.0% CPU use I used to get 10-13W on openSUSE 12.2. Now I'm seeing 25-30W. Thinking it might be helped by a BIOS upgrade, I tried that. No change. Here is the contents of my /etc/modprobe.d/99-local.conf file: blacklist mei blacklist firewire_ohci # power_save=60 means after being idle for 60s go into power savings, and # when in that power saving mode, also reset the controller options snd_hda_intel enable_msi=1 power_save=60 power_save_controller=1 options iwlwifi power_save=1 options e1000e SmartPowerDownEnable=1 Where can I look for more info? Booting with: acpi_osi=Linux drops things to: The battery reports a discharge rate of 18.2 W The estimated remaining time is 2 hours, 47 minutes Summary: 427.7 wakeups/second, 1.1 GPU ops/seconds, 0.0 VFS ops/sec and 7.3% CPU use (I should note that this and the previous measurements were taken with kwin effects turned off). Possibly relevant: bug 801341 -- my laptop says "RC6 issue" when I run the script. FWIW, here my /proc/cmdline BOOT_IMAGE=dev000:\efi\SuSE\vmlinuz-3.7.10-1.1-default root=/dev/disk/by-id/ata-ST320LT007-9ZV142_W0Q2ZTWC-part6 resume=/dev/disk/by-id/ata-ST320LT007-9ZV142_W0Q2ZTWC-part5 splash=silent quiet showopts pcie_aspm=powersave i915.i915_enable_rc6=7 i915.i915_enable_fbc=1 i915.lvds_downclock=1 powertop shows laptop fan at 20 WATTS ! The battery reports a discharge rate of 24.5 W The estimated remaining time is 1 hours, 28 minutes Summary: 402.0 wakeups/second, 79.7 GPU ops/seconds, 0.0 VFS ops/sec and 3.0% CPU use Power est. Usage Events/s Category Description 18.3 W 3868 rpm Device Laptop fan 6.04 W 80.0% Device Display backlight top shows cpus idle %Cpu(s): 2.9 us, 0.7 sy, 0.0 ni, 96.1 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st To see if rc6 is still working after resume from s2ram (standby) check the "Idle stats" page of sudo /usr/sbin/powertop Look at section "GPU". If "Powered On" is at 100% rc6 probably isn't enabled. A workaround that helped on my notebook Thinkpad x220 (Intel i7-2620M, Sandy Bridge): Use the kernel from here http://download.opensuse.org/repositories/Kernel:/stable/standard/ I installed: kernel-desktop-3.8.3-1.1.x86_64.rpm 16-Mar-2013 12:15 Ah, thanks, that's helpful. rc6 is indeed disabled :-/ I'm trying 3.8.2 from factory.
Without firefox running (firefox + flash is a big CPU host) or desktop effects enabled:
The battery reports a discharge rate of 12.6 W
The estimated remaining time is 4 hours, 13 minutes
Summary: 298.9 wakeups/second, 1.2 GPU ops/seconds, 0.0 VFS ops/sec and 8.1% CPU use
and from "GPU":
| GPU |
| |
| Powered On 37.3% |
| RC6 62.7% |
| RC6p 0.0% |
| RC6pp 0.0% |
The overwhelming cost on my laptop now is the backlight.
After installing kernel-default-3.8.3-1.1.x86_64 from Kernel:stable the problem is *NOT* fully solved. I still get situations where the GPU is on 100% load, with all the RC6 states at 0% Updated to 3.8.5-1-default, GPU at 100% :-( Still a big problem for me. 3.9.4 was good, 3.9.7 is bad for sure. For me, it's the rc6 issue. Wow, just updated to 3.10 and battery went from 100% to 0% within 2hrs. That's bad. (In reply to comment #14) > Wow, just updated to 3.10 and battery went from 100% to 0% within 2hrs. That's > bad. Same problem. Used this kernel: http://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/kernel-default-3.10.0-1.1.g3dcd746.x86_64.rpm BUT: The reason for this was that my cpu was running at full speed ( 2.70GHz ) all the time. It wasn't slowing down to any lower level. I think it didn't had something to do with the gpu (checked gpu sleep-level using powertop). cpufreq-governor was set to "powersave". Alternatively just "performance" seemed to be available since some kernel versions on my Intel Core i7-2620M (already in kernel 3.9.8 and maybe before). No more "conservative" or "ondemand". https://wiki.archlinux.org/index.php/CPU_Frequency_Scaling Maybe this is related: http://www.golem.de/news/linux-kernel-p-states-verringern-leistungsaufnahme-auf-intel-cpus-1305-99336.html So this seems to be a DIFFERENT bug for kernel 3.10 Sigh, I expect this is about several issues: - I expect in 12.3 we have an Intel graphics RC6 issue, but I did not have time to look at this one, I am also not that familiar with Intel graphics driver. (a guess, but this is what I would look out for in 12.3) > So this seems to be a DIFFERENT bug for kernel 3.10 There I saw a very critical bug: One CPU was polling instead of entering sleep states. You can verify with: cpupower monitor when idle. One CPU never enters deeper sleep state, but gets woken up really often (double check with interrupt count via powertop or watch -n1 cat /proc/interrupts). Also for specific recent Intel CPUs (Model: 2a, 2d, 3a, compare with /proc/cpuinfo, cpu family : 6 model : 42, 45, 58 The new Intel pstate driver is used. This is why one reports: > BUT: The reason for this was that my cpu was running at full speed ( 2.70GHz ) But in fact this may not affect power consumption on a recent CPU. Look out whether it enters deepest sleep/idle states using these tools: cpupower monitor powertop turbostat cpupower monitor w/3.10.1-1.g19a2fe9-desktop:
|Nehalem || SandyBridge || Mperf || Idle_Stats
CPU | C3 | C6 | PC3 | PC6 || C7 | PC2 | PC7 || C0 | Cx | Freq || POLL | C1-S | C1E- | C3-S | C6-S | C7-S
0| 0.00| 0.00| 0.00| 0.00|| 0.00| 0.00| 0.00|| 99.69| 0.31| 3379|| 98.81| 0.00| 0.00| 0.00| 0.00| 0.00
1| 0.00| 0.00| 0.00| 0.00|| 0.00| 0.00| 0.00|| 0.77| 99.23| 3315|| 0.00| 0.00| 0.00| 0.00| 0.00| 99.17
2| 0.28| 0.10| 0.00| 0.00|| 96.11| 0.00| 0.00|| 1.34| 98.66| 3188|| 0.00| 0.00| 0.21| 0.17| 0.10| 98.17
3| 0.28| 0.10| 0.00| 0.00|| 96.11| 0.00| 0.00|| 1.30| 98.70| 3188|| 0.00| 0.13| 0.07| 0.13| 0.00| 98.38
/proc/interrupts 6 minutes after booting:
CPU0 CPU1 CPU2 CPU3
0: 24 0 0 0 IR-IO-APIC-edge timer
1: 12 0 0 0 IR-IO-APIC-edge i8042
8: 1 0 0 0 IR-IO-APIC-edge rtc0
9: 6333 0 0 0 IR-IO-APIC-fasteoi acpi
12: 1962 0 0 0 IR-IO-APIC-edge i8042
16: 3170 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb1, mmc0
18: 0 0 0 0 IR-IO-APIC-fasteoi i801_smbus
23: 6470 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb2
40: 0 0 0 0 DMAR_MSI-edge dmar0
41: 0 0 0 0 DMAR_MSI-edge dmar1
42: 22 0 0 0 IR-PCI-MSI-edge mei_me
43: 101131 0 0 0 IR-PCI-MSI-edge ahci
44: 33722 0 0 0 IR-PCI-MSI-edge i915
45: 30785 0 0 0 IR-PCI-MSI-edge eth0
46: 1104 0 0 0 IR-PCI-MSI-edge iwlwifi
47: 1534 0 0 0 IR-PCI-MSI-edge snd_hda_intel
NMI: 609 60 46 47 Non-maskable interrupts
LOC: 389427 50298 54103 44780 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 609 60 46 47 Performance monitoring interrupts
IWI: 1130 1773 1645 1792 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 104272 121121 150597 160098 Rescheduling interrupts
CAL: 1074 1829 1764 1945 Function call interrupts
TLB: 1208 495 655 798 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 3 3 3 3 Machine check polls
ERR: 0
MIS: 0
The first CPU shows the problem: C0 | Cx | Freq || 99.69| 0.31| 3379|| POLL | C1-S | C1E- | C3-S | C6-S | C7-S 98.81| 0.00| 0.00| 0.00| 0.00| 0.00 There are so many interrupts happening on this CPU that the CPU does not even enter C1, but is kept in a polling loop for very low latency. I asked mainline maintainers already, but nobody had an idea for a concrete modification or could not reproduce and I did not have time to bisect this down yet. Afaik this came in with at least 3.9 already. Looks like a timer programming issue, probably only on specific HW. Stay tuned and ping me at the end of the week if I did not answer until then. For info: About the graphics RC6 mode not entered: Jan (Loeser) also has this issue and 3.9.3 did not work for him. Latest 3.10.2 kernel does work and RC6 got entered. Which makes the CPU Package state (PC7 and others) enter and you see how the temperature is lowered consistently. He is not affected by the timer interrupt problem mentioned above, so he is fine with this kernel. Ah yes, and things still work, also after suspend. I'm now on 3.7.10-44.g57b6816-desktop and - so far - everything is working very well, even after multiple suspend-resume cycles. Current temperature is 113F. Bug 801341 is also fixed by this kernel for me. (In reply to comment #19) > For info: > About the graphics RC6 mode not entered: > Jan (Loeser) also has this issue and 3.9.3 did not work for him. > Latest 3.10.2 kernel does work and RC6 got entered. Which makes the CPU Package > state (PC7 and others) enter and you see how the temperature is lowered > consistently. > > He is not affected by the timer interrupt problem mentioned above, so he is > fine with this kernel. > Ah yes, and things still work, also after suspend. After last resume (with kernel 3.10.2), still the same problem. RC6 is _not_entered and temperature is at high peak. I close this one won't fix for 12.3. If this is still a problem, please open a new, short bug with Egbert Eich and Takashi in CC. This is something for the old HW enablement team... |