Bug 216205

Summary: Machine does not wake up from interrupts from C2 sleeping state when compiled with CONFIG_SMP
Product: [openSUSE] openSUSE 10.2 Reporter: Magnus Boman <mboman>
Component: Mobile DevicesAssignee: Thomas Renninger <trenn>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: behlert, cbonner, felix.rommel, venkatesh.pallipadi
Version: RC 4   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: output
acpidump and dmesg
Modified, compiled DSDT. Let _PPC always return zero unconditionally
output file
dmesg and acpidump results
before turnoff AC log
after turnoff AC

Description Magnus Boman 2006-10-30 09:36:39 UTC
I've got openSUSE 10.2 Beta1 installed on an IBM Thinkpad T43P. I've added the CPU frequency scaling monitor applet to my panel. Every time the laptop is "idle" the CPU is running at full speed (100%). When performing heavy operations, such as compiling gargnome etc, it goes down to 35%.
I've configured the system for maximum performance.
Let me know what I can collect for you to help troubleshoot this one.
Comment 1 Magnus Boman 2006-10-30 09:39:03 UTC
I should add that it's not a bug with the applet in itself. Checking with cpufreq-info gives the same result as the applet.
Comment 2 Stefan Behlert 2006-10-30 11:10:51 UTC
Holger, can you look at that or is this more in Thomas' field?
Comment 3 Holger Macht 2006-10-30 11:18:13 UTC
Yes, I will care about. I think I know what's going on, I just need to do the fix ;-)

TO verify my suspicion, Magnus, can you please post the output of '/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies'? Thanks.
Comment 4 Magnus Boman 2006-10-30 11:48:19 UTC
mblxws01:/home/mboman # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2266000 1866000 1600000 1333000 1066000 800000 
Comment 5 Holger Macht 2006-10-30 12:01:29 UTC
Unfortunatelly it's not what I thought it is. We had an issue where scaling_available_frequencies was in reverse order so that the scalign mechanism was confused. Can you please try with the userspace governor, doing

dbus-send --system --print-reply --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer org.freedesktop.Hal.Device.CPUFreq.SetCPUFreqGovernor string:userspace

Thanks.
Comment 6 Magnus Boman 2006-10-30 12:25:05 UTC
mblxws01:/home/mboman # dbus-send --system --print-reply --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer org.freedesktop.Hal.Device.CPUFreq.SetCPUFreqGovernor string:userspace
method return sender=:1.6 -> dest=:1.41


After issuing this command, it behaves as expected.
Comment 7 Holger Macht 2006-10-30 12:27:53 UTC
Sorry Thomas, ondemand bug ;-) Do you have any idea?
Comment 8 Magnus Boman 2006-10-30 20:22:10 UTC
Some more information. I've warm booted my machine several times. I'm getting different results when checking files in /sys/devices/system/cpu/cpu0/cpufreq.
I also see a crash_notes file in /sys/devices/system/cpu/cpu0. Not sure if it's relevant.
I know for a fact that when I made the first output (where the CPU max is 800000) that HAL wasn't running.
Output of results (attaching a file as well incase it's difficult to read from here)

-- This output is taken when the frequency does not change at all. It always sits at 35%.

mblxws01:/sys/devices/system/cpu/cpu0/cpufreq # grep "" *
affected_cpus:0
cpuinfo_cur_freq:800000
cpuinfo_max_freq:2266000
cpuinfo_min_freq:800000
scaling_available_frequencies:2266000 1866000 1600000 1333000 1066000 800000 
scaling_available_governors:conservative ondemand userspace powersave performance 
scaling_cur_freq:800000
scaling_driver:centrino
scaling_governor:performance
scaling_max_freq:800000
scaling_min_freq:800000


mblxws01:/sys/devices/system/cpu/cpu0/cpufreq # cpufreq-info 
cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to http://bugs.opensuse.org, please.
analyzing CPU 0:
  driver: centrino
  CPUs which need to switch frequency at the same time: 0
  hardware limits: 800 MHz - 2.27 GHz
  available frequency steps: 2.27 GHz, 1.87 GHz, 1.60 GHz, 1.33 GHz, 1.07 GHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance
  current policy: frequency should be within 800 MHz and 800 MHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency is 800 MHz (asserted by call to hardware).



-- This output is taken after switching to user space and everything is working

mblxws01:/sys/devices/system/cpu/cpu0/cpufreq # grep "" *
affected_cpus:0
cpuinfo_cur_freq:800000
cpuinfo_max_freq:2266000
cpuinfo_min_freq:800000
scaling_available_frequencies:2266000 1866000 1600000 1333000 1066000 800000 
scaling_available_governors:conservative ondemand userspace powersave performance 
scaling_cur_freq:800000
scaling_driver:centrino
scaling_governor:userspace
scaling_max_freq:2266000
scaling_min_freq:800000
scaling_setspeed:800000


mblxws01:/sys/devices/system/cpu/cpu0/cpufreq # cpufreq-info 
cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to http://bugs.opensuse.org, please.
analyzing CPU 0:
  driver: centrino
  CPUs which need to switch frequency at the same time: 0
  hardware limits: 800 MHz - 2.27 GHz
  available frequency steps: 2.27 GHz, 1.87 GHz, 1.60 GHz, 1.33 GHz, 1.07 GHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance
  current policy: frequency should be within 800 MHz and 2.27 GHz.
                  The governor "userspace" may decide which speed to use
                  within this range.
  current CPU frequency is 800 MHz (asserted by call to hardware).


-- This output is taken when it's working the opposite way (ie: idle=100%, busy=35%)

mblxws01:/sys/devices/system/cpu/cpu0/cpufreq # grep "" *
affected_cpus:0
cpuinfo_cur_freq:2266000
cpuinfo_max_freq:2266000
cpuinfo_min_freq:800000
scaling_available_frequencies:2266000 1866000 1600000 1333000 1066000 800000 
scaling_available_governors:conservative ondemand userspace powersave performance 
scaling_cur_freq:2266000
scaling_driver:centrino
scaling_governor:performance
scaling_max_freq:2266000
scaling_min_freq:800000

mblxws01:/sys/devices/system/cpu/cpu0/cpufreq # cpufreq-info 
cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to http://bugs.opensuse.org, please.
analyzing CPU 0:
  driver: centrino
  CPUs which need to switch frequency at the same time: 0
  hardware limits: 800 MHz - 2.27 GHz
  available frequency steps: 2.27 GHz, 1.87 GHz, 1.60 GHz, 1.33 GHz, 1.07 GHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance
  current policy: frequency should be within 800 MHz and 2.27 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency is 2.27 GHz (asserted by call to hardware).


-- Every time my computer has been started, this file is created. Not sure if it's relevant

mblxws01:/sys/devices/system/cpu/cpu0 # cat crash_notes 
1fc32800
Comment 9 Magnus Boman 2006-10-30 20:22:56 UTC
Created attachment 103088 [details]
output
Comment 10 Thomas Renninger 2006-10-31 12:26:55 UTC
The first output of comment #8 shows that something is broken in kernel cpufreq code.
When running with performance governor, freq must be highest.

Can you add CPUFREQ_ENABLED="no" in /etc/sysconfig/powersave/cpufreq and try some things manually, pls.

First, is this phenomenon always happening or does it only happen after some time?

Try to load the cpufreq driver by hand (modprobe speedstep-centrino, if this does not work it should be modprobe acpi).

Venkatesh, do you know about this one?
Load governors by hand: modprobe userspace;modprobe ondemand;modprobe performance

cd /sys/devices/system/cpu/cpu0/cpufreq
echo ondemand >scaling_governor (does cpu get switched up and down on CPU load?)
Let it stay down and try:
echo performance >scaling_governor

Is the machine on lowest freq now?
If yes, we hit the bug.
Does echo userspace >scaling_govornor;echo 2266000 >scaling_setspeed switch up to highest freq now?
If yes, do echo ondemand >scaling_governor  (does cpu get switched up and down?)
Comment 11 Thomas Renninger 2006-10-31 12:38:29 UTC
You may want to use:
watch -n1 cat /sys/devices/system/cpu/cpu0/cpufreq/{scaling_cur_freq,scaling_max_freq}
to watch the frequency changing...
Comment 12 Magnus Boman 2006-10-31 21:22:10 UTC
>>Can you add CPUFREQ_ENABLED="no" in /etc/sysconfig/powersave/cpufreq and try
>>some things manually, pls.
This doesn't work. I had to manually comment out the loading of speedstep-centrino etc in /etc/init.d/acpid (lines 180 and 181)

>>First, is this phenomenon always happening or does it only happen after some
>>time?
It happens every time

>>Try to load the cpufreq driver by hand (modprobe speedstep‑centrino, if this
>>does not work it should be modprobe acpi).
After modprobe, CPU freq. is on 100%

>>Load governors by hand: modprobe userspace;modprobe ondemand;modprobe
>>performance
mblxws01:/home/mboman # modprobe userspace;modprobe ondemand;modprobe performance
FATAL: Module userspace not found.
FATAL: Module ondemand not found.
FATAL: Module performance not found.

>>cd /sys/devices/system/cpu/cpu0/cpufreq
>>echo ondemand >scaling_governor (does cpu get switched up and down on CPU
>>load?)
Yes, CPU is switched up and down

>>Let it stay down and try:
>>echo performance >scaling_governor
>>Is the machine on lowest freq now?
>>If yes, we hit the bug.
No, CPU freq goes to 100% and stays there

>>Does echo userspace >scaling_govornor;echo 2266000 >scaling_setspeed switch up
>>to highest freq now?
CPU is already on max freq. echo'ing 800000 will set it to min freq. Then I can switch it back to max with echo'ing 2266000

>>If yes, do echo ondemand >scaling_governor  (does cpu get switched up and
>>down?)
Setting it to ondemand makes it switch up and down


It seems that if I make one manual change to the governor, it will behave as it should.
Comment 13 Magnus Boman 2006-10-31 23:53:07 UTC
Just noticed the following behaviour:

Running on AC - Set the governor to ondemand
Unplug the AC - Scaling still works as expected
Plug in the AC - Governor changes to performance and freq is running constantly on 100%
Comment 14 Holger Macht 2006-11-15 12:24:22 UTC
Thomas, any further ideas? Because I cannot help you much here, I'll reassigning the bug to you, sorry ;-)
Comment 15 Magnus Boman 2006-11-19 10:33:17 UTC
I still have this issue in Beta2Plus
Comment 16 Magnus Boman 2006-11-19 10:48:58 UTC
Hmm.. It seems that what is happening now is;

When I start a program that is cpu intensive, the cpu switches up to highest frequency for a little while (around 5 seconds), then it switches back to the lowest. The program may still be taking all resources, but the cpu remains on lowest frequency.
Comment 17 Venkatesh Pallipadi 2006-11-19 15:44:50 UTC
Does this happen only when you run some CPU intensive workload?

Can you get the output of 
#cat /proc/acpi/thermal_zone/*/*
before and after you have this state.
Output of
#cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
will also help.

We have seen similar issue earlier. What happens is that the CPU max frequency gets reduced due to thermal condition and even when you have any governor ondemand/performance/ they cannot use higher freq until temparature event goes away.

Comment 18 Magnus Boman 2006-11-19 19:32:57 UTC
Looks like you are correct. When I first turned on the laptop, it seemed to work fine. So I stressed it a bit and after a couple of minutes, it refused to go to highest frequency for more than a couple of seconds. The sad thing is that it only takes 5 minutes of cpu intensive work for it to come to this state.
I didn't have these issues with previous SUSE versions.

Thermal ZONE when it works fine;

mblxws01:/home/mboman # cat /proc/acpi/thermal_zone/*/*
<setting not supported>
cooling mode:   passive
polling frequency:       2 seconds
state:                   ok
temperature:             50 C
critical (S5):           99 C
passive:                 95 C: tc1=5 tc2=4 tsp=600 devices=0xdffec338 

mblxws01:/home/mboman # cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
2266000




Thermal ZONE when it works fine;

mblxws01:/home/mboman # cat /proc/acpi/thermal_zone/*/*
<setting not supported>
cooling mode:   passive
polling frequency:       2 seconds
state:                   ok
temperature:             56 C
critical (S5):           99 C
passive:                 95 C: tc1=5 tc2=4 tsp=600 devices=0xdffec338 

mblxws01:/home/mboman # cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
2266000
Comment 19 Magnus Boman 2006-11-19 19:44:25 UTC
Looking at it a bit more. It seems that the issue starts when the temperature is 50C or more. When I did the first cat /proc/acpi/thermal_zone/*/*, it must have been just before it started to happen.
Comment 20 Thomas Renninger 2006-11-19 19:52:20 UTC
Best is a cpufreq debug kernel it does not make much sense to search without, let's do it the same way as we did with the ATA ACPI errors, I put a kernel into my Export dir.
You will have to boot this kernel with: cpufreq.debug=7
Comment 23 Magnus Boman 2006-11-20 05:54:43 UTC
I added the debug kernel, did a reboot and saved acpidump and dmesg.
If you need me to start the machine with the debug kernel and do something else, please let me know.
Comment 24 Magnus Boman 2006-11-20 05:55:07 UTC
Created attachment 106182 [details]
acpidump and dmesg
Comment 25 Thomas Renninger 2006-11-21 13:43:10 UTC
This possibly (with high chance) is fixed in RC1.
Venkatesh: with the patch recently posted to cpufreq list by Bruno...

Hmm, I already set it to fixed.
Please reopen if you still run into it, or pls set the bug to verified if it works for you.
Comment 26 Magnus Boman 2006-11-22 23:31:35 UTC
Reopened.
Unfortunately still the same issue with RC1.
Comment 27 Thomas Renninger 2006-12-01 12:43:45 UTC
Created attachment 107862 [details]
Modified, compiled DSDT. Let _PPC always return zero unconditionally

Can you pls try:
cp DSDT.aml /etc/DSDT.aml
Modify ACPI_DSDT="" to ACPI_DSDT="/etc/DSDT.aml" in /etc/sysconfig/kernel
invoke: mkinitrd

reboot and try whether it works now (also check in dmesg for a string like: DSDT replace by OS (you should be able to grep that easily by something like "dmesg | grep -i initrd", or "dmesg | grep -i dsdt".

Is the machine always at lowest speed as soon as powersaved or hal started and the cpufreq driver got loaded? (Can't see the other comments right now, maybe I already asked that..)

After testing you should better revert the change in:
/etc/sysconfig/kernel
and invoke mkinitrd again.
What is being done here is a BIOS table (one that was included in acpidump and which I modified) is replaced by the kernel (only in ram, so no problem to revert the change).

If this works we can be sure that something is wrong with _PPC (the function the kernel invokes to check whether BIOS has limited CPU frequencies). I only know about thermal limits and this one, it theoratically could be something else, but due to the dmesg debug output you sent, chances are quite high that it's that.
Comment 28 Magnus Boman 2006-12-02 20:49:32 UTC
Using DSDT.aml makes the cpu scale up and down properly. The temerature went up to 59C and it still worked fine.
Are there any negative impacts using this one until we have a final solution?
Comment 29 Thomas Renninger 2006-12-04 11:21:36 UTC
Can you check whether you have some BIOS option that could influence that. Some thinkpads e.g. can be configured in BIOS that cpufreq is forced to be lowered on battery/AC/always or whatever. For such stuff the _PPC function is for.
I think I already asked in another bug, whether you run the latest BIOS?
If not it might be worth to update.
If the bug still exists, we have to dig.

> Are there any negative impacts using this one until we have a final solution?
Not really, as long as you don't alter HW (e.g. plug in more memory) or update BIOS.
If you update BIOS and it still does not work, can you attach acpidump again, pls.
Comment 30 sun mkti 2006-12-04 17:31:12 UTC
i am using ASUS Laptop M9V Centrino 1.86G, (in plug AC mode cpuinfo report maximum speed and temp > 60C all time, with cpu usage < 5%  why??

then Battery Mode , cpuscaling perfectly from 768 ,1.1 to 1.9GHz.  
there is something mistake on plug AC mode(dynamic). Is it a problem on kernel or Kpowersave 

in previous 10.1 it worked fine on same machine.
Comment 31 Magnus Boman 2006-12-04 20:08:44 UTC
Thomas,
I checked the BIOS. It's embarrassing but I found an option;

"Adaptive Thermal Management"
   "AC Power" -> Balanced
   "Battery Power" -> Balanced

I changed the settings to maximum performance and then removed the DSDT.aml from the kernel. The machine is working fine now.
What surprises me is that in SLED10/SUSE Linux 10.1, this wasn't an issue.
Anyway, this is a real solution for me.
Thanks for all your help. Wish I had checked the BIOS settings to start with :-(
Comment 32 Thomas Renninger 2006-12-05 09:55:46 UTC
We have a IBM Thinkpad T43P and a totally different model ASUS Laptop M9V showing the same issue which did not occur with former kernels.
I'd say something is broken and we should find out what it is, at least for SLE11 and mainline.
10.2 is out now and backports are risky, we cannot do much about this one and there at least exists a workaround (hopefully also for the ASUS model).
Sun: Can you also do checks described in comment #29, pls.

This looks like a problem deep in ACPI interpreter code and could be hard to find. As there are more critical bugs where no workaround exists, I like to keep this one for later. Magnus, can you keep an eye on that one and try again with some kind of SLE11 Preview again with default BIOS settings, pls. Not sure whether I should keep the bug open... let's wait for Sun's report, first
Comment 33 Thomas Renninger 2006-12-05 09:56:26 UTC
Sun: Can you also attach acpidump output, pls.
Comment 34 sun mkti 2006-12-05 19:20:22 UTC
Created attachment 108404 [details]
output file
Comment 35 sun mkti 2006-12-05 19:25:24 UTC
Created attachment 108405 [details]
dmesg and acpidump  results
Comment 36 sun mkti 2006-12-05 19:27:37 UTC
I checked the BIOS. no any option that can change , it have only adjust LCD Power and Battery menu

After using DSDT.aml , nothing changed ,cpu still scaled at maximum speed all time
i have attached a output file.
Comment 37 sun mkti 2006-12-06 06:04:20 UTC
without  AC  supply (Batteryonly)
echo conservative >scaling_governor        #789M    scaling from 789, 1.1, 1.3, 1.9G
echo ondemand >scaling_governor	         #789M    scaling  789, 1.9G
echo userpace >scaling_governor				#789M  Constant
echo powersave >scaling_governor			#789M  Constant
echo performance >scaling_governor			#1.9G    Constant
notable from this mode 
- i feel like  responding from keyboard faster  and key repeat faster too.
- while rebooting machine  it also take a shorter time than AC plug mode,  3 time.
-everything perfectly :)))

AC plug mode
echo conservative >scaling_governor                    #1.9G Constant
echo ondemand >scaling_governor			#1.9G Constant
echo userpace >scaling_governor				#1.9G  Constant
echo powersave >scaling_governor			#789M  Constant
echo performance >scaling_governor			#1.9G    Constant
notable from this mode 
- terminal key repeat slower than the above
- while rebooting,  there are some period that it looklike stop period longer than usual
 such before line ACPI....................................[done]   8(((

ok just some info ,have to study :) 
                                    newbielinux,  
                                   thank anyway.
Comment 38 Thomas Renninger 2006-12-06 13:19:56 UTC
> After using DSDT.aml
Don't do that! This one was for mboman's machine and its BIOS. Better revert that change immediately and invoke mkintrd afterwards.

Ahhh, Venkatesh, I think the patch to check for frequency duplicates is missing. I remember there was something on cpufreq list recently:
scaling_available_frequencies:1862000 1862000 1862000 1862000 1862000 1862000 1596000 1330000 1064000 798000

Hmm, the machine should still come to 798000?
I will check (give me some time, this one has not highest prio for me right now).

If it's not that, you can check:
Can you invoke top when it does not switch down anymore, pls.
Is there a thread/process (evtl. kacpid,powersaved,hal,..) that is running in some kind of loop and utilises 100% of the processor?
Comment 39 sun mkti 2006-12-08 20:12:59 UTC
my problem now is i got some very high cpu load spikes over a half an hour (using system monitoring program)

But the top processes were as low as ever! ?  (top no any process > 5% )

but when i turnoff AC supply, cpu load drop and every work smoothly ,key repeat faster
not quite sure whether it involved with ACPI  or not.

when turn power back again, cpu load rises to 50% same symptom 
Comment 40 sun mkti 2006-12-08 20:36:18 UTC
i think ,the first problem is cpufreq scaling up overtime and second problem is cpuload > 50% is releated right  because there are some ghost-processes did not report in 'top' list.
Comment 41 Thomas Renninger 2006-12-12 17:46:07 UTC
I just found:
http://bugzilla.kernel.org/show_bug.cgi?id=7060
... another phenomenon, still could be related.

Can you show the top summary line when you think the processor is under load. Something like:
Cpu(s):  0.0%us,  5.3%sy,  3.3%ni, 91.0%id,  0.1%wa,  0.0%hi,  0.2%si,  0.0%st

Comment 42 sun mkti 2006-12-13 11:58:32 UTC
i have captured top and ps 2files 
without touching anything just turn power AC on and off.
Comment 43 sun mkti 2006-12-13 11:59:42 UTC
Created attachment 109487 [details]
before turnoff AC  log
Comment 44 sun mkti 2006-12-13 12:00:20 UTC
Created attachment 109488 [details]
after turnoff AC
Comment 45 Thomas Renninger 2006-12-13 13:57:34 UTC
*** Bug 222513 has been marked as a duplicate of this bug. ***
Comment 46 Thomas Renninger 2006-12-13 14:15:09 UTC
This is indeed very strange!
(You forgot to set the mime type of the attachement to be a text document. Nothing serious, just for future... makes live easier for the reader).

Short summary of the bad top output:
Cpu(s): 42.9%us, 17.9%sy,  0.0%ni, 39.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
 3410 root      15   0  177m  36m 6700 S  1.9  3.6  10:50.05 Xorg
24404 root      16   0 29344  15m  12m S  1.6  1.6   0:01.91 ksysguard
 4393 root      15   0 39716  23m  15m R  1.0  2.4   9:56.53 kicker
24405 root      15   0  4108 1476  996 S  1.0  0.1   0:00.75 ksysguardd
   10 root      10  -5     0    0    0 S  0.3  0.0   2:07.59 kacpid
    1 root      15   0   740  292  244 S  0.0  0.0   0:00.94 init

If this happens can you try to (first you must install the oprofile package) and provide the output files of opreport commands, then we should have an idea what is processed...:

# Unzip the (hopefully) provided /boot/vmlinux-kernel-ver.gz file
gunzip /boot/vmlinux-`uname -r`.gz

modprobe oprofile
opcontrol vmlinux=/boot/vmlinux-`uname -r`
opcontrol --start;sleep 10; opcontrol --stop

opreport --symbols >function_call.txt
opreport --callgraph >callgraph.txt
opreport --long-filenames >longfiles.txt

# For reference, there is nice docu here:
# /usr/share/doc/packages/oprofile/oprofile.html
Comment 47 sun mkti 2006-12-14 08:15:16 UTC
root@Sunse[~] #gunzip /boot/vmlinux-`uname -r`.gz
root@Sunse[~] #ls /boot                                                                                                      
System.map-2.6.18.2-33-default  config-2.6.18.2-33-default  message                             vmlinux-2.6.18.2-33-default
backup_mbr                      grub                        symsets-2.6.18.2-33-default.tar.gz  vmlinuz
bak                             initrd                      symtypes-2.6.18.2-33-default.gz     vmlinuz-2.6.18.2-33-default
boot                            initrd-2.6.18.2-33-default  symvers-2.6.18.2-33-default.gz

root@Sunse[~] #modprobe oprofile                                                                           
root@Sunse[~] #opcontrol vmlinux=/boot/vmlinux-`uname -r`                                                                    
Unknown option "vmlinux". See opcontrol --help
root@Sunse[~] #opcontrol --start;sleep 10; opcontrol --stop                                                                  
Profiler running.
Stopping profiling.
root@Sunse[~] #                                                                                                              
root@Sunse[~] #opreport --symbols >function_call.txt                                                                         
root@Sunse[~] #opreport --callgraph >callgraph.txt                                                                           
root@Sunse[~] #opreport --long-filenames >longfiles.txt                                                                      
root@Sunse[~] #cat function_call.txt                                                                                         
CPU: PIII, speed 1862 MHz (estimated)
Counted MMX_INSTR_RET events (number of MMX instructions retired) with a unit mask of 0x00 (No unit mask) count 931000
samples  %        symbol name
61       100.000  (no symbols)
root@Sunse[~] #cat callgraph.txt                                                                                             [ 3:09PM]
CPU: PIII, speed 1862 MHz (estimated)
Counted MMX_INSTR_RET events (number of MMX instructions retired) with a unit mask of 0x00 (No unit mask) count 931000
samples  %        symbol name
-------------------------------------------------------------------------------
61       100.000  (no symbols)
  61       100.000  (no symbols) [self]
-------------------------------------------------------------------------------
root@Sunse[~] #cat longfiles.txt                                                                                             
CPU: PIII, speed 1862 MHz (estimated)
Counted MMX_INSTR_RET events (number of MMX instructions retired) with a unit mask of 0x00 (No unit mask) count 931000
MMX_INSTR_RET:...|
  samples|      %|
------------------
       61 100.000 /usr/lib/xorg/modules/libfb.so
root@Sunse[~] #  
Comment 48 sun mkti 2006-12-14 08:22:49 UTC
Ahhhh!  i have pluged usb WLAN adapter Edimax Model FW-7318USg (using ralink chip rt73.ko)

after following command :
#insmod rt73.ko
#ifconfig rausb0 up 
my system performance grow better speed  and cpu freq drop to 1.6 instead of maximum 1.9 ,also feel that keyboard respond is faster  :)

but after issue ifconfig rausb0 down
evry thing back to same sympton :(
Comment 49 Thomas Renninger 2006-12-14 09:05:22 UTC
> Counted MMX_INSTR_RET events
Hmm, only MMX instructions were counted? That's not really what I wanted to have, I don't know oprofile that well, maybe it's even not better possible with the machine/cpu, maybe some other parameter needs to be passed?

This get's even stranger..., you can try to shut down all devices and rip out all kinds of modules, maybe you find a bad one that produces the Load? Hmm, Magnus probably does not have this card. Maybe it's the (uhci/ehci)_hcd driver?

Also try to set ACPI_MODULES="NONE" in /etc/sysconfig/powersave/common. If you reboot also rip out the thermal and fan module. The only ACPI module that now still exists is processor and cpufreq should still work? The system should not notice anymore now whether AC state changes. If it works now, it's one of the ACPI modules.

Hmm, I ask on the kernel list about the wrong process acounting...
Comment 50 Magnus Boman 2006-12-14 09:27:59 UTC
Thomas, that is correct that I don't have a ralink card.
If there is anything you want me to do/check that can help with this one (apart from the SLE11/10.3 updates), let me know.
Comment 51 Thomas Renninger 2006-12-14 12:01:58 UTC
Sun: Can you remove the vga= options in /boot/grub/menu.lst, pls.
This:
/usr/lib/xorg/modules/libfb.so
seems to be the framebuffer library. I think it's not that, because oprofile seem to only have profiled MMX instructions and I think the libfb library is one of the rare ones that got MMX optimised. Still... just to be sure it's not the framebuffer code that utilises the CPU.
Comment 52 sun mkti 2006-12-15 16:16:16 UTC
Ahhh! :)))) problem was solved already. last night i went home,samui island i just guess apic option and smp in boot menu. after i had changed option in /boot/grub/menu.lst: like two options here  [nosmp noapic]

edite file menu.lst and modifie folowing line

kernel /boot/vmlinuz root=/dev/hda6 vga=0x317 resume=/dev/hda7 nosmp noapic splash=silent  showopts

everything work fine!. not quite sure this is a best solution but it's ok for me now...............thank Thomas and Magnus
sun-shine back to me :))
Comment 53 Thomas Renninger 2007-01-19 10:54:38 UTC
I now have a Sony Vaio here showing the same problem.
Top often shows 100% CPU load peaks and machine is very slow.
It seems to come from APIC if kernel is compiled with CONFIG_SMP.
If C2 is entered it won't get woken up by a reprgrammed timer, but only by the regular timer interrupt.
The boot parameter that should be best fitting for you is processor.max_cstates=1.
Trying to finally solve this...

This seems to be a general issue (Now reports from Thinkpad, Asus, Sony Vaio).
Comment 54 Venkatesh Pallipadi 2007-01-20 00:52:59 UTC
Thomas,

Can you attach the output of 
cat /proc/interrupts; cat /proc/stats; sleep 10; cat /proc/interrupts; cat /proc/stats

before and after
cat 1 > /sys/module/processor/parameters/max_cstate

with smp and apic enabled.

I don't quite understand why CPU is showing 100% if Local APIC interrupts missing is the problem.

If missing local apic timer in C2 is the problem, workaround should be simple. Enable timer_broadcast on C2 for these systems.

But, are these Core 2 based laptops? Then C2 should not have this local APIC issue.
Comment 55 Thomas Renninger 2007-01-20 12:42:39 UTC
> If missing local apic timer in C2 is the problem, workaround should be simple.
> Enable timer_broadcast on C2 for these systems.
Thanks. I first thought it's much earlier when APIC is set up. If it's that: "when timer to wake up C2 is chosen, the kernel thinks it's an SMP machine, because this one has two APICs (one disabled) and takes the wrong method for waking up from sleeping states", I should be able to fix this... I have access to the machine on Mo/Tue again and hopefully can come up with something then.

We use smp kernels also on UP machines since 10.2 (without CONFIG_SMP all is fine).

> But, are these Core 2 based laptops? Then C2 should not have this local APIC
> issue
No, these are Pentium M.
Comment 56 Thomas Renninger 2007-01-23 15:56:52 UTC
I think you hit two different bugs.
Magnus bug should be related to:
http://bugzilla.kernel.org/show_bug.cgi?id=7859
Will add a fix to 10.2 kernel.

and the other one I still try to figure out what is going on. I could not spend time on it the last two days, but will hopefully be able to come up with something soon.
Comment 57 Petr Ostadal 2007-01-30 11:39:35 UTC
*** Bug 236723 has been marked as a duplicate of this bug. ***
Comment 58 Petr Ostadal 2007-01-30 11:44:08 UTC
*** Bug 216218 has been marked as a duplicate of this bug. ***
Comment 59 Petr Ostadal 2007-01-30 12:01:11 UTC
fix typo in comment #53, boot parameter should be without 's' in the end, "processor.max_cstate=1" is the right and works for me.
Comment 60 Thomas Renninger 2007-02-28 09:25:33 UTC
Fixed in 10.2 branch.
An update kernel is coming out soon AFAIK.