Bug 678882

Summary: Strange Kernel NMI messages
Product: [openSUSE] openSUSE 11.4 Reporter: Forgotten User xRcrmyYBVX <forgotten_xRcrmyYBVX>
Component: KernelAssignee: Jiri Slaby <jslaby>
Status: RESOLVED NORESPONSE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: jslaby, sbrabec
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmesg output
dmesg output
unzipped dmesg
hwinfo

Description Forgotten User xRcrmyYBVX 2011-03-11 13:15:14 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.127 Safari/534.16

Once in a while, while working on my brand new Os11.4 machine (2.6.37.1-1.2-desktop), I received the following kernel messages:
 
kernel:[ 4811.322957] Uhhuh. NMI received for unknown reason 30 on CPU 0.
kernel:[ 4811.322971] Do you have a strange power saving mode enabled?
kernel:[ 4811.322976] Dazed and confused, but trying to continue

Is there anything to do about this? 
I don't see any problems aside from this message currently.


Reproducible: Couldn't Reproduce

Steps to Reproduce:
1.
2.
3.
Comment 1 Forgotten User xRcrmyYBVX 2011-03-11 13:16:27 UTC
Created attachment 418852 [details]
dmesg output
Comment 2 Forgotten User xRcrmyYBVX 2011-03-11 13:16:27 UTC
Created attachment 418853 [details]
dmesg output
Comment 3 Forgotten User xRcrmyYBVX 2011-03-11 13:17:41 UTC
sorry, double clicked the upload button when uploading the dmesg output...
Comment 4 Jiri Slaby 2011-03-14 10:01:09 UTC
Created attachment 419092 [details]
unzipped dmesg

Never ever pack attachments like dmesg.
Comment 5 Jiri Slaby 2011-03-14 10:13:33 UTC
(In reply to comment #0)
> kernel:[ 4811.322957] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> kernel:[ 4811.322971] Do you have a strange power saving mode enabled?
> kernel:[ 4811.322976] Dazed and confused, but trying to continue
> 
> Is there anything to do about this? 
> I don't see any problems aside from this message currently.

Hmm, hard to say as nobody seems to announce he is the source of the NMI. 0x30 are just timer statuses.

Could you investigate whether this happens when you are doing some specific task. E.g. downloading with wifi/NIC or doing high rate memory transfers (some computations etc.) or similar? Or if it is just random.
Comment 6 Forgotten User xRcrmyYBVX 2011-03-14 17:34:13 UTC
So far, it has not happened again. I've been using 11.4 since Milestone 6, and in all the time since then this message happened 3-4 times. 

On another machine with identical hardware, I got this on Saturday:
Mar 12 07:13:22 st-brennenstuhl-114 kernel: [45861.971582] Uhhuh. NMI received for unknown reason 31 on CPU 0.
Mar 12 07:13:22 st-brennenstuhl-114 kernel: [45861.971590] Do you have a strange power saving mode enabled?
Mar 12 07:13:22 st-brennenstuhl-114 kernel: [45861.971594] Dazed and confused, but trying to continue

Judging from the date and time, I would say the machine was completely idle without any users logged in. Looks like it's kinda random.
Comment 7 Forgotten User xRcrmyYBVX 2011-03-15 10:37:39 UTC
The same/similar problem was reported here:
https://bugzilla.novell.com/show_bug.cgi?GoAheadAndLogIn=1&id=679567

Please find my hwinfo attached, maybe there is a similarity here...
Comment 8 Forgotten User xRcrmyYBVX 2011-03-15 10:38:35 UTC
Created attachment 419369 [details]
hwinfo
Comment 9 Forgotten User xRcrmyYBVX 2011-03-27 21:04:29 UTC
So I compared my dmesg output on OpenSuSE 11.4 with one from 11.1 on IDENTICAL hardware. The only difference I could see were the following lines:

11.4:
-----
[    7.532019] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.06
[    7.532101] iTCO_wdt: Found a ICH8 or ICH8R TCO device (Version=2, TCOBASE=0x1060)
[    7.532118] iTCO_wdt: cannot register miscdev on minor=130 (err=-16)
[    7.532137] iTCO_wdt: probe of iTCO_wdt failed with error -16

11.1:
-----
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.03 (30-Apr-2008)
iTCO_wdt: Found a ICH8 or ICH8R TCO device (Version=2, TCOBASE=0x1060)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)

May this have something to do with it? You mentioned that the NMI reasons are  "timer" related...?
Comment 10 Jiri Slaby 2011-04-26 18:43:26 UTC
(In reply to comment #9)
> May this have something to do with it? You mentioned that the NMI reasons are 
> "timer" related...?

Hmm, I wouldn't say so. Another theory is NMI IPIs which are not expected. I'll build a test kernel to dump the last per-cpu generate-NMI timestamp.

Don't forget to remove needinfo flag.
Comment 11 Jiri Slaby 2011-04-26 20:54:33 UTC
(In reply to comment #10)
> Hmm, I wouldn't say so. Another theory is NMI IPIs which are not expected. I'll
> build a test kernel to dump the last per-cpu generate-NMI timestamp.

Or not, maybe later. NMIs are generated by perf events and also by the NMI watchdog. You don't have perf event enabled. But anyway, could you boot with "nowatchdog nmi_watchdog=0" kernel parameters?

Also if this is reasonably easy to reproduce, could you try kernel:head?

http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.4/
Comment 12 Jiri Slaby 2011-04-29 12:51:43 UTC
*** Bug 679567 has been marked as a duplicate of this bug. ***
Comment 13 Jiri Slaby 2011-06-07 15:27:51 UTC
Closed due to lack of response.
Comment 14 Jiri Slaby 2011-06-12 19:16:52 UTC
I just got one myself with 2.6.39.1-1-desktop:
[197345.582891] Uhhuh. NMI received for unknown reason 3c on CPU 0.
[197345.582895] Do you have a strange power saving mode enabled?
[197345.582897] Dazed and confused, but trying to continue

No specific load, nothing unusual.
Comment 15 Jiri Slaby 2011-06-14 13:19:54 UTC
And one from SLE11SP1 appeared at bnc#690397 comment 13 with 2.6.32.27-0.2.
Comment 16 Jiri Slaby 2011-07-28 13:43:49 UTC
After resume in 3.0.0-1-desktop:
Restarting tasks ... done.
PM: Basic memory bitmaps freed
video LNXVIDEO:00: Restoring backlight state
Uhhuh. NMI received for unknown reason 2c on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue