Bug 647029

Summary: nehalem: memory corruption 2.6.34.7-0.3 - may be related to ACPI button.ko driver
Product: [openSUSE] openSUSE 11.3 Reporter: Rich Coe <rcoe>
Component: KernelAssignee: Jiri Slaby <jslaby>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P2 - High CC: bjorn.helgaas, coolo, forgotten_a525umNONh, jslaby, kent.liu, lchiquitto, len.brown, Mike.Gerdts, trenn, youquan.song
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 11.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: z400 failure #1
z400 failure #2
z400 failure #3
z400 failure #4
z600 failure #1
z600 failure #2
z600 failure #3
z600 failure #4
z600 failure #5
z600 failure #6
init script to auto reboot the system

Description Rich Coe 2010-10-15 15:28:55 UTC
Created attachment 395206 [details]
z400 failure #1

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.10) Gecko/20100914 SUSE/3.6.10-0.3.1 Firefox/3.6.10 GTB7.1

I have two failing systems running opensuse 11.3 with kernel
update 2.6.34.7-0.3-desktop.

I would appreciate any tips or suggestions for finding and elminating
this failure.

About 1 out every 10 or so reboots, I get an OOPS or BUG 12 seconds into 
the boot.

Both systems have completed running stress testing at over 100 hours.
I ran memtest86+ 4.00 and 4.10 and found no errors.

Initially, the text mode console didn't allow to view the complete bug
traceback.  Enabling fbcon just made the lower half of the screen unreadable
when the oops occurred.

After enabling the tty console, we noticed the message just prior to the failure
    [    8.987388] PM: Starting manual resume from disk
I disabled the resume= parameter, changing it to noresume, to no effect on
the frequency of the issue.

Here is a summary of the BUG/OOPS for each system.
I will attach the full boot messages for each failure.
z400/inst.1:[   12.452040] general protection fault: 0000 [#1] PREEMPT SMP

z400/inst.2:[   16.595201] BUG: Bad page map in process udevd  pte:894c0000039cc581 pmd:1bbe79067

z400/inst.3:[   11.972733] Oops: 0002 [#1] PREEMPT SMP
z400/inst.3:[   11.972822] BUG: scheduling while atomic: udevd/456/0x00000002
z400/inst.3:[   11.972974] BUG: Bad page state in process udevd  pfn:1b8de6

z400/inst.4:[   11.912988] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
z400/inst.4:[   11.912999] Oops: 0000 [#1] PREEMPT SMP

z600/inst.1:[   10.225249] kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.34.7/linux-2.6.34/kernel/timer.c:643!
z600/inst.1:[   10.254149] invalid opcode: 0000 [#1] PREEMPT SMP

z600/inst.2:[   11.318113] general protection fault: 0000 [#1] PREEMPT SMP

z600/inst.3:[   10.131683] kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.34.7/linux-2.6.34/kernel/timer.c:643!
z600/inst.3:[   10.160581] invalid opcode: 0000 [#1] PREEMPT SMP

z600/inst.4:[   11.332447] general protection fault: 0000 [#1] PREEMPT SMP

z600/inst.5:[   11.262224] general protection fault: 0000 [#1] PREEMPT SMP



Reproducible: Sometimes

Steps to Reproduce:
1. reboot.
Comment 1 Rich Coe 2010-10-15 15:31:03 UTC
Created attachment 395207 [details]
z400 failure #2

[   16.595201] BUG: Bad page map in process udevd  pte:894c0000039cc581 pmd:1bbe79067
Comment 2 Rich Coe 2010-10-15 15:32:23 UTC
Created attachment 395208 [details]
z400 failure #3

[   11.972723] BUG: unable to handle kernel NULL pointer dereference at 0000000000000482
[   11.972726] IP: [<ffffffff814a2024>] _raw_spin_unlock_irqrestore+0x4/0x40
[   11.972731] PGD 1be905067 PUD 1bb5e8067 PMD 0
[   11.972733] Oops: 0002 [#1] PREEMPT SMP
[   11.972735] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0d/device:0e/uevent
[   11.972736] CPU 1
[   11.972737] Modules linked in: soundcore snd_page_alloc button(+) preloadtrace ext4 jbd2 crc16 sd_mod fan proc
essor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
[   11.972744]
Comment 3 Rich Coe 2010-10-15 15:33:21 UTC
Created attachment 395209 [details]
z400 failure #4

[   11.912988] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[   11.912991] ACPI: Power Button [PBTN]
[   11.912992] IP: [<ffffffff8118e330>] sysfs_refresh_inode+0x20/0xd0
[   11.912998] PGD bc077067 PUD bc334067 PMD 0
[   11.912999] Oops: 0000 [#1] PREEMPT SMP
[   11.913001] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C01:00/uevent
[   11.913002] CPU 1
[   11.913003] Modules linked in: button(+) e1000e(+) preloadtrace ext4 jbd2 crc16 sd_mod fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
[   11.913009]
Comment 4 Rich Coe 2010-10-15 15:34:28 UTC
Created attachment 395210 [details]
z600 failure #1

[   10.225249] kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.34.7/linux-2.6.34/kernel/timer.c:643!
[   10.254149] invalid opcode: 0000 [#1] PREEMPT SMP
[   10.268637] last sysfs file: /sys/module/thermal_sys/sections/__ksymtab_strings
[   10.290541] CPU 2
[   10.296029] Modules linked in: preloadtrace ext4 jbd2 crc16 sd_mod fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
Comment 5 Rich Coe 2010-10-15 15:35:41 UTC
Created attachment 395211 [details]
z600 failure #2

[   11.318113] general protection fault: 0000 [#1] PREEMPT SMP
[   11.318115] last sysfs file: /sys/devices/pci0000:3f/0000:3f:06.3/uevent[   11.318116] CPU 1
[   11.318117] Modules linked in: snd soundcore snd_page_alloc wmi(+) button(+) preloadtrace ext4 jbd2 crc16 sd_m
od fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal the
rmal_sys
[   11.318123]
[   11.318125] Pid: 622, comm: udevd Not tainted 2.6.34.7-0.3-desktop #1 0AE8h/HP Z600 Workstation
Comment 6 Rich Coe 2010-10-15 15:36:58 UTC
Created attachment 395212 [details]
z600 failure #3

[   10.131683] kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.34.7/linux-2.6.34/kernel/timer.c:643!
[   10.160581] invalid opcode: 0000 [#1] PREEMPT SMP
[   10.175066] last sysfs file: /sys/module/thermal_sys/sections/__ksymtab_strings
[   10.196969] CPU 2
[   10.202457] Modules linked in: preloadtrace ext4 jbd2 crc16 sd_mod fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
[   10.225141]
[   10.225143] Pid: 451, comm: stapio Not tainted 2.6.34.7-0.3-desktop #1 0AE8h/HP Z600 Workstation
Comment 7 Rich Coe 2010-10-15 15:37:46 UTC
Created attachment 395213 [details]
z600 failure #4

[   11.332447] general protection fault: 0000 [#1] PREEMPT SMP
[   11.332449] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2/name
[   11.332450] CPU 1
[   11.332451] Modules linked in: serio_raw snd_page_alloc sg pcspkr sr_mod cdrom iTCO_wdt(+) iTCO_vendor_support tg3(+) button(+) preloadtrace ext4 jbd2 crc16 sd_mod fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
[   11.332460]
[   11.332461] Pid: 653, comm: usb_id Not tainted 2.6.34.7-0.3-desktop #1 0AE8h/HP Z600 Workstation
Comment 8 Rich Coe 2010-10-15 15:38:30 UTC
Created attachment 395214 [details]
z600 failure #5

[   11.262224] general protection fault: 0000 [#1] PREEMPT SMP
[   11.262226] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.2/usb8/8-2/bDeviceClass
[   11.262227] CPU 1
[   11.262228] Modules linked in: floppy(+) snd_page_alloc wmi(+) button(+) preloadtrace ext4 jbd2 crc16 sd_mod fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
[   11.262235]
[   11.262237] Pid: 477, comm: udevd Not tainted 2.6.34.7-0.3-desktop #1 0AE8h/HP Z600 Workstation
Comment 9 Rich Coe 2010-10-15 15:40:27 UTC
Created attachment 395215 [details]
z600 failure #6

[   10.222117] kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.34.7/linux-2.6.34/kernel/timer.c:643!
[   10.251015] invalid opcode: 0000 [#1] PREEMPT SMP
[   10.265502] last sysfs file: /sys/module/thermal_sys/sections/__ksymtab_strings
[   10.287405] CPU 2
[   10.292895] Modules linked in: preloadtrace ext4 jbd2 crc16 sd_mod fan processor mptsas mptscsih mptbase scsi_transport_sas ata_generic ata_piix ahci libata scsi_mod thermal thermal_sys
[   10.315465]
[   10.315466] Pid: 440, comm: stapio Not tainted 2.6.34.7-0.3-desktop #1 0AE8h/HP Z600 Workstation
Comment 10 Rich Coe 2010-10-15 23:13:51 UTC
Some experiments were performed today.
It was noticed that one of the recent messages prior to the failure is
input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2

As a simple test, the acpi button module was moved out of the way, and 
both systems rebooted about 40 times without error.

As another test, the acpi button module was rebuilt with the patch below.
Both systems rebooted over 100 times without error. 
Inspection of the built code has not shown any obvious differences.
We suspect that this do to some timing problem rather than a code issue.

diff -r f6462a021cfb BUILD/kernel-desktop-2.6.34.7/linux-2.6.34/drivers/acpi/button.c
--- a/drivers/acpi/button.c     Fri Oct 15 11:46:31 2010 -0500
+++ b/drivers/acpi/button.c     Fri Oct 15 12:11:13 2010 -0500
@@ -353,9 +353,13 @@
                goto err_free_button;
        }
 
+       printk(KERN_INFO PREFIX "button loading\n");
        hid = acpi_device_hid(device);
+       printk(KERN_INFO PREFIX "hid: <%s>\n", hid);
        name = acpi_device_name(device);
+       printk(KERN_INFO PREFIX "name: <%s>\n", name);
        class = acpi_device_class(device);
+       printk(KERN_INFO PREFIX "class: <%s>\n", class);
 
        if (!strcmp(hid, ACPI_BUTTON_HID_POWER) ||
            !strcmp(hid, ACPI_BUTTON_HID_POWERF)) {
Comment 11 Jiri Slaby 2010-10-16 09:22:28 UTC
(In reply to comment #10)
> We suspect that this do to some timing problem rather than a code issue.

Hmm, it looks like a nice example of heisenbug. Somebody heavily overwrites some memory. I don't see anything suspicious in the diff.

Is it reproducible with kernel-vanilla-2.6.34.7? If yes, could you bisect it?
Comment 12 Jiri Slaby 2010-10-16 12:50:39 UTC
BTW which kernel worked for you? Is this a regression against 2.6.34 or 2.6.34.4?
Comment 13 Rich Coe 2010-10-16 15:31:12 UTC
Not sure if how far this regression goes.  Last time we ran this test was 
against opensuse 11.1 kernel.

I will test with earlier versions to see if I can find something to 
track down in a kernel change.
Comment 14 Rich Coe 2010-10-18 17:08:53 UTC
I have started a reboot cycle with 2.6.34.4.

Some weekend results.
z400 - with button.c printk patch 151 reboots til panic
z600 - with button.c printk patch 102 reboots til panic

So the frequency is reduced, but still occurs.

We also tested on the following systems and are seeing failures:
sun ultra 27 - nehalem B3, same as z400
intel westmere C2
z800 with 8 virtual box vm's, all vm's are hung, two are stopped at
a panic message.
Comment 15 Rich Coe 2010-10-18 21:10:16 UTC
I installed and ran 2.6.34.4 and had over 200 reboots without fail.
There was one error message during boot:
[ from memory ] boot.startpreload: unable to load preload.ko

While I start to check diffs for a bisect of 2.6.34.4 vs 2.6.34.7,
I started a reboot cycle of 2.6.34.7 with boot parameter 'nopreload'
Comment 16 Rich Coe 2010-10-18 21:21:10 UTC
One other observation is that this issue seems to be isolated to 
some nahelem hardware.   I'm still working out a list of systems where 
it fails and passes.
Comment 17 Rich Coe 2010-10-19 20:41:03 UTC
> recorded panic count intervals
> z400  = 63, 70, 55, 151, 219, 291, 42	(C2 W3520 2.66Ghz, 3GB/6GB, SATA, BIOS 3.07)
> z600  = 41, 36, 15, 48, 2, 91, 6	(B3 1xX5550 2.66Ghz, 3GB/6GB, SATA, BIOS 3.10) 
> z400-2  = 6018 reboots AOK		(B3 W3520 2.66Ghz, 4GB, SATA, BIOS 3.07)
> z800  = 785 reboots AOK		(B3, 2xE5530 2.40Ghz, 12GB, SAS, BIOS 3.07)
> 
> Chris has also generated boot panics or hangs on Sun Ultra 27 (Nehalem
> B3), Intel Westmere server (C2), and 8 Virtual Box VM's on Z800.
Comment 18 Rich Coe 2010-10-20 14:50:21 UTC
z600 B3 with kernel-vanilla-2.6.34.7-0.3 panic'd at 218 reboots
amd with kernel-desktop-2.6.34.7-0.3 has not failed.
Comment 19 Rich Coe 2010-10-22 01:07:56 UTC
In an effort to track this issue down further, we enabled the kernel debugger. 

Here is the panic:
[    8.198872] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 G B/465 GiB)
[    8.198878] sd 1:0:0:0: [sdb] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
[    8.198905] sd 1:0:0:0: [sdb] Write Protect is off
[    8.198918] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled , doesn't support DPO or FUA
[    8.198990]  sdb: sdb1
[    8.292815] sd 0:0:0:0: [sda] Write Protect is off
[    8.292874] sd 1:0:0:0: [sdb] Attached SCSI disk
[    8.321028] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled , doesn't support DPO or FUA
[    8.348192]  sda: sda1 sda2 sda3
[    8.376524] sd 0:0:0:0: [sda] Attached SCSI disk
[    8.780992] EXT4-fs (sda2): mounted filesystem with ordered data mode
[   10.625894] udev: starting version 157
[   10.866462] ACPI: WMI: Mapper loaded
[   11.000934] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP 0C0C:00/input/input2
[   11.026147] ACPI: Power Button [PBTN]
[   11.026150] general protection fault: 0000 [#1] PREEMPT SMP
[   11.026152] last sysfs file: /sys/module/wmi/initstate

Failing insn:
0xffffffff8118e3b2 sysfs_refresh_inode+0xa2:    cmpb   $0x1,0x30(%rax)
      ax = 0xe930000000000000   

%rax is pointing to an invalid page.
Analysis of the source of the contents of %rax does not show any error.  
We're not sure where the 0xe930000000000000 value came from ...


kdb> bt
Stack traceback for pid 451
0xffff8800bbd724c0      451      443  1    1   R  0xffff8800bbd72968 *udev d
sp                ip                Function (args)
0xffff8800bc133cd8 0xffffffff8118e3b2 sysfs_refresh_inode+0xa2 (0xffff8800 be15e930, 0xffff8800bec40490)
kdb_bb: address 0x0000000000000018 not recognised
Using old style backtrace, unreliable with no arguments
sp                ip                Function (args)
0xffff8800bc133cd8 0xffffffff8118e3b2 sysfs_refresh_inode+0xa2
0xffff8800bc133d20 0xffffffff8118e415 sysfs_permission+0x35
0xffff8800bc133d40 0xffffffff8112e3c8 exec_permission+0x28
0xffff8800bc133d60 0xffffffff811300b0 link_path_walk+0x80
0xffff8800bc133d70 0xffffffff813af268 move_addr_to_user+0x88
0xffff8800bc133da0 0xffffffff813af3d4 __sys_recvmsg+0x154
0xffff8800bc133e00 0xffffffff81130c7a path_walk+0x5a
0xffff8800bc133e40 0xffffffff81130d3b do_path_lookup+0x4b
0xffff8800bc133e50 0xffffffff8112ec33 getname+0x43
0xffff8800bc133e70 0xffffffff81130eca user_path_at+0x5a
0xffff8800bc133ef0 0xffffffff81056ab3 do_setitimer+0x1b3
0xffff8800bc133f40 0xffffffff81129397 sys_readlinkat+0x37

kdb> rd
     r15 = 0xffff8800bc133da8      r14 = 0xffff8800bc133db8
     r13 = 0xffff8800bbd724c0      r12 = 0xffff8800be15e930
      bp = 0xffff8800be15e930       bx = 0xffff8800bec40490
     r11 = 0x0000000000000000      r10 = 0x0000000000000000
      r9 = 0x6e692f7475706e69       r8 = 0x0000000000000000
      ax = 0xe930000000000000       cx = 0x0000000000000000
      dx = 0x0000000000000000       si = 0xffff8800bec40490
      di = 0xffff8800be15e930  orig_ax = 0xffffffffffffffff
      ip = 0xffffffff8118e3b2       cs = 0x0000000000000010
   flags = 0x0000000000010246       sp = 0xffff8800bc133cf0
      ss = 0x0000000000000018 &regs = 0xffff8800bc133c58

objdump -d fs/sysfs/inode.o
0000000000000000 <sysfs_refresh_inode>:
   0:   55                      push   %rbp
   1:   48 89 fd                mov    %rdi,%rbp
   4:   53                      push   %rbx
   5:   48 89 f3                mov    %rsi,%rbx
   8:   48 83 ec 08             sub    $0x8,%rsp
   c:   48 8b 47 40             mov    0x40(%rdi),%rax
  10:   0f b7 57 34             movzwl 0x34(%rdi),%edx
  14:   48 85 c0                test   %rax,%rax
  17:   66 89 96 aa 00 00 00    mov    %dx,0xaa(%rsi)
  1e:   74 57                   je     77 <sysfs_refresh_inode+0x77>
  20:   8b 50 08                mov    0x8(%rax),%edx
  23:   48 89 df                mov    %rbx,%rdi
  26:   89 56 50                mov    %edx,0x50(%rsi)
  29:   8b 50 0c                mov    0xc(%rax),%edx
  2c:   89 56 54                mov    %edx,0x54(%rsi)
  2f:   48 8b 50 18             mov    0x18(%rax),%rdx
  33:   48 89 56 70             mov    %rdx,0x70(%rsi)
  37:   48 8b 50 20             mov    0x20(%rax),%rdx
  3b:   48 89 56 78             mov    %rdx,0x78(%rsi)
  3f:   48 8b 50 28             mov    0x28(%rax),%rdx
  43:   48 89 96 80 00 00 00    mov    %rdx,0x80(%rsi)
  4a:   48 8b 50 30             mov    0x30(%rax),%rdx
  4e:   48 89 96 88 00 00 00    mov    %rdx,0x88(%rsi)
  55:   48 8b 50 38             mov    0x38(%rax),%rdx
  59:   48 89 96 90 00 00 00    mov    %rdx,0x90(%rsi)
  60:   48 8b 50 40             mov    0x40(%rax),%rdx
  64:   48 89 96 98 00 00 00    mov    %rdx,0x98(%rsi)
  6b:   8b 50 58                mov    0x58(%rax),%edx
  6e:   48 8b 70 50             mov    0x50(%rax),%rsi
  72:   e8 00 00 00 00          callq  77 <sysfs_refresh_inode+0x77>
  77:   80 7d 30 01             cmpb   $0x1,0x30(%rbp)
  7b:   74 0b                   je     88 <sysfs_refresh_inode+0x88>
  7d:   48 83 c4 08             add    $0x8,%rsp
  81:   5b                      pop    %rbx
  82:   5d                      pop    %rbp
  83:   c3                      retq   
  84:   0f 1f 40 00             nopl   0x0(%rax)
  88:   48 8b 45 28             mov    0x28(%rbp),%rax
  8c:   ba 02 00 00 00          mov    $0x2,%edx
  91:   48 85 c0                test   %rax,%rax
  94:   74 21                   je     b7 <sysfs_refresh_inode+0xb7>
  96:   30 d2                   xor    %dl,%dl
  98:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  9f:   00 
  a0:   31 c9                   xor    %ecx,%ecx
  a2:   80 78 30 01             cmpb   $0x1,0x30(%rax)
  a6:   48 8b 40 10             mov    0x10(%rax),%rax
  aa:   0f 94 c1                sete   %cl
  ad:   01 ca                   add    %ecx,%edx
  af:   48 85 c0                test   %rax,%rax
  b2:   75 ec                   jne    a0 <sysfs_refresh_inode+0xa0>
  b4:   83 c2 02                add    $0x2,%edx
  b7:   89 53 4c                mov    %edx,0x4c(%rbx)
  ba:   48 83 c4 08             add    $0x8,%rsp
  be:   5b                      pop    %rbx
  bf:   5d                      pop    %rbp
  c0:   c3                      retq   
  c1:   66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
  c8:   0f 1f 84 00 00 00 00 
  cf:   00 


  88:   48 8b 45 28             mov    0x28(%rbp),%rax
kdb> %bp
%bp = 0xffff8800be15e930

kdb> md %bp
0xffff8800be15e930 0000000000000010 ffff8800be15e978   ........x.......
0xffff8800be15e940 0000000000000000 ffff8800bc75fb20   ........ .u.....
0xffff8800be15e950 ffff8800bbe36838 ffff8800be15e8e8   8h..............
0xffff8800be15e960 000041ed00000001 0000000000001d66   .....A..f.......
0xffff8800be15e970 0000000000000000 0000000000000004   ................
0xffff8800be15e980 ffff8800378d9420 0000000000000000    ..7............
0xffff8800be15e990 ffff8800be3667e0 ffff8800bbe2d140   .g6.....@.......
0xffff8800be15e9a0 ffff8800be15e930 000041ed00000001   0............A..

%rax =? ffff8800be15e8e8
kdb> md 0xffff8800be15e8e8
0xffff8800be15e8e8 0000000000000003 ffff8800be15e930   ........0.......
0xffff8800be15e8f8 ffff8800be15e8a0 ffffffff817a2f44   ........D/z.....
0xffff8800be15e908 ffffffff81a45e80 0000000000000000   .^..............
0xffff8800be15e918 000081a400000002 0000000000001d67   ........g.......

  8c:   ba 02 00 00 00          mov    $0x2,%edx
  91:   48 85 c0                test   %rax,%rax
%rax != 0

  94:   74 21                   je     b7 <sysfs_refresh_inode+0xb7>
  96:   30 d2                   xor    %dl,%dl
clear %dl

  98:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  9f:   00 
  a0:   31 c9                   xor    %ecx,%ecx
clear %ecx

  a2:   80 78 30 01             cmpb   $0x1,0x30(%rax)
*(0xffff8800be15e918) 0x81a4 & 1 != 1

  a6:   48 8b 40 10             mov    0x10(%rax),%rax
%rax = ffff8800be15e8a0

  af:   48 85 c0                test   %rax,%rax
  b2:   75 ec                   jne    a0 <sysfs_refresh_inode+0xa0>
%rax != 0

  a0:   31 c9                   xor    %ecx,%ecx
  a2:   80 78 30 01             cmpb   $0x1,0x30(%rax)
  a6:   48 8b 40 10             mov    0x10(%rax),%rax

kdb> md 0xffff8800bd15e8a0
0xffff8800bd15e8a0 0000000000000000 0000000000000000   ................
0xffff8800bd15e8b0-0xffff8800bd15e90f zero suppressed
0xffff8800bd15e910 0000000000000000 0000000000000000   ................

%rax should equal 0
      ax = 0xe930000000000000
Comment 20 Rich Coe 2010-10-22 04:21:52 UTC
Some additional info.  Today we started a shutdown with power-off and 
wake-on-lan test.
We found the same panic on boot issue, with a high frequency of occurrence.
Comment 21 Rich Coe 2010-10-25 16:55:40 UTC
I went searching for a kernel that passes without failure, so that I could
start bisecting for a failure.

v2.6.36 failed after 647 reboots  BUG: unable to handle kernel paging request
v2.6.33 failed at boot
v2.6.32 failed at boot
v2.6.34.1 boots, but after 4 - 10 boots, cannot find sda2 because sda is not found
v2.6.34.2 boots, fails similar to 2.6.34.1 but got 58 reboots til gpf panic
v2.6.34 boots, but cannot find sda2 every 4 - 5 reboots

went back to testing with v2.6.34.7 to make sure hard disk was working correctly
and got 125 reboots until gpf in show_uevent.

Another engineer has had some success with FC13 2.6.34.7-56 in limited testing.
I'm starting to look at bisecting with that source difference against 
opensuse 2.6.34.7-0.3.  There are 435 differences in the .config, and I 
have not yet done a diff against the added patches.
Comment 22 Rich Coe 2010-10-26 16:20:29 UTC
FC13 2.6.34.7-56 with the suse .config failed at 487 reboots.
I'm starting to bisect the kernel config with 50 different options.
Comment 23 Rich Coe 2010-10-27 16:01:52 UTC
Created attachment 397346 [details]
init script to auto reboot the system

To reproduce this issue on a nahelem/westmere with opensuse kernel 2.6.34.7-0.3
If the system reboots over 1000 times, then you might not have an issue,
although it's been recommended to let it reboot to 2000.  

To install:
    copy the rebooter file to /etc/init.d/rebooter
    chkconfig rebooter on
    reboot

To stop:
    Use up/down arrow in grub to select default kernel
    press 'e' to edit selection
    scroll down to kernel boot line
    press 'e' to edit selection
    add 'single' to end of line
    hit enter to commit
    hit b to boot
    Type root password at login prompt.
    chkconfig rebooter off
    init 3 or init 5
Comment 24 Jiri Slaby 2010-10-27 21:02:12 UTC
I've extracted the causes of the bugs from the reports:

pte:894c0000039cc581
0000000000000482 <- possibly NULL with nonzero dereference
000000000000000a <-             -- " --
CR2: 000000000061f278
CR2: 00007fff5bc5cbdc
CR2: 000000000067e088
ax = 0xe930000000000000
BUG_ON(!timer->function);

I cannot see there anything ringing a bell here.
Comment 25 Rich Coe 2010-10-28 14:41:25 UTC
I've bisected the kernel params down to these few:
Left is original suse, Right is the working version.

462,464c470,472
< CONFIG_ACPI_AC=m
< CONFIG_ACPI_BATTERY=m
< CONFIG_ACPI_BUTTON=m
< CONFIG_ACPI_FAN=m
< CONFIG_ACPI_PROCESSOR=m
< CONFIG_ACPI_THERMAL=m
---
> CONFIG_ACPI_AC=y
> CONFIG_ACPI_BATTERY=y
> CONFIG_ACPI_BUTTON=y
> CONFIG_ACPI_FAN=y
> CONFIG_ACPI_PROCESSOR=y
> CONFIG_ACPI_THERMAL=y

473d480
< CONFIG_ACPI_CUSTOM_DSDT_FILE=""
< CONFIG_ACPI_CUSTOM_OVERRIDE_INITRAMFS=y

477,479c483,484
< CONFIG_ACPI_DEBUG=y
< # CONFIG_ACPI_DEBUG_FUNC_TRACE is not set
< CONFIG_ACPI_PCI_SLOT=m
---
> # CONFIG_ACPI_DEBUG is not set
> CONFIG_ACPI_PCI_SLOT=y

481c486
< CONFIG_ACPI_CONTAINER=m
---
> CONFIG_ACPI_CONTAINER=y

496,497c501,502
< CONFIG_CPU_FREQ_TABLE=y
< # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
< CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
< CONFIG_CPU_FREQ_GOV_USERSPACE=m
< CONFIG_CPU_FREQ_GOV_ONDEMAND=y
---
> CONFIG_CPU_FREQ_TABLE=m
> CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_ONDEMAND=m

508c513
< # CONFIG_X86_PCC_CPUFREQ is not set
---
> CONFIG_X86_PCC_CPUFREQ=m

512c517
< # CONFIG_X86_P4_CLOCKMOD is not set
---
> CONFIG_X86_P4_CLOCKMOD=m
Comment 26 Jiri Slaby 2010-10-28 15:05:49 UTC
(In reply to comment #25)
> I've bisected the kernel params down to these few:
> Left is original suse, Right is the working version.

Thanks for doing that. Could you also attach lsmod output from both kernels to kill off those which are not loaded at all?
Comment 27 Rich Coe 2010-10-28 17:50:14 UTC
Module                  Size  Used by
af_packet              23229  0 
cpufreq_conservative    12628  0 
cpufreq_userspace       3264  0 
cpufreq_powersave       1258  0 
acpi_cpufreq            8399  0 
mperf                   1523  1 acpi_cpufreq
fuse                   75897  1 
loop                   18524  0 
dm_mod                 86873  0 
snd_hda_codec_realtek   324511  1 
snd_hda_intel          28557  0 
snd_hda_codec         113249  2 snd_hda_codec_realtek,snd_hda_intel
snd_hwdep               7922  1 snd_hda_codec
firewire_ohci          26938  0 
firewire_core          60890  1 firewire_ohci
crc_itu_t               1747  1 firewire_core
snd_pcm               105589  2 snd_hda_intel,snd_hda_codec
snd_timer              26828  1 snd_pcm
hp_wmi                  7258  0 
sr_mod                 16684  0 
cdrom                  43440  1 sr_mod
sg                     33348  0 
iTCO_wdt               12170  0 
snd                    84412  6 snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer
ohci1394               33702  0 
soundcore               9003  1 snd
iTCO_vendor_support     3150  1 iTCO_wdt
ieee1394              104868  1 ohci1394
tg3                   146290  0 
snd_page_alloc          9569  2 snd_hda_intel,snd_pcm
e1000e                152493  0 
rfkill                 21863  1 hp_wmi
pcspkr                  2222  0 
wmi                     8750  1 hp_wmi
floppy                 68349  0 
serio_raw               5318  0 
edd                    10208  0 
ext4                  401916  2 
jbd2                  100410  1 ext4
crc16                   1715  1 ext4
nouveau               553824  1 
ttm                    65938  1 nouveau
drm_kms_helper         33008  1 nouveau
sd_mod                 41436  4 
drm                   223234  3 nouveau,ttm,drm_kms_helper
i2c_algo_bit            6728  1 nouveau
button                  6989  1 nouveau
battery                12302  0 
ac                      4055  0 
fan                     4559  0 
processor              45747  1 acpi_cpufreq
mptsas                 45696  0 
mptscsih               28810  1 mptsas
mptbase                73708  2 mptsas,mptscsih
scsi_transport_sas     38936  1 mptsas
ata_generic             3739  0 
ata_piix               23945  3 
ahci                   42680  0 
libata                211449  3 ata_generic,ata_piix,ahci
scsi_mod              191748  7 sr_mod,sg,sd_mod,mptsas,mptscsih,scsi_transport_sas,libata
thermal                20625  0 
thermal_sys            18230  3 fan,processor,thermal
Comment 28 Jiri Slaby 2010-10-28 18:35:35 UTC
Hmm, I don't see anything suspicious in the diff of configs. The only real difference is ACPI_DEBUG.

There is one more possibility. The order. If some driver is built-in instead of being built as a module, it may use kernel structures earlier, before they are potentially inited. I'm afraid we cannot work out which one it is unless you narrow the config difference to a single module.
Comment 29 Rich Coe 2010-10-29 22:03:19 UTC
We found an issue with preloadtrace modlule on westmere, in addition 
to the ACPI issue. 

We're still investigating.
Comment 30 Rich Coe 2010-11-04 20:24:14 UTC
I am able to create a gpf with this simple script:
--- script start ---
let i=0
while : ; do
	rmmod tg3
	rmmod button

	sleep 3 
	modprobe tg3 & modprobe button &
	let i=$i+1
	echo "completed $i"
	sleep 5
done
--- script end ---
Comment 31 Jiri Slaby 2010-11-04 22:00:54 UTC
(In reply to comment #30)
> I am able to create a gpf with this simple script:

The question is whether it exposes the overwritten memory or they overwrite the memory themselves. If you blacklist/remove those modules and run the reboot test, does it appear?

Or what happens if you try the loop only with a single module? And with totally different modules?
Comment 32 Rich Coe 2010-11-05 15:40:41 UTC
The purpose of the script was to eliminate the reboot for purpose
of analysis. 

We already know that either moving ACPI modules to initrd or compiled
into the kernel either masks the issue or prevents the issue.

In trying to determine the root cause without a host of other modprobes
operating simultaneously during boot, I noticed that in at least 3
instances were when button and tg3 were loading.  That's why I picked tg3 to 
interact with button.  

There seems to be some interaction of the loading and initialization of 
ACPI button which causes the issue.
Comment 33 Thomas Renninger 2011-01-11 11:44:40 UTC
I also run into this when I tried to install 11.4 (2.6.37) on a Nehalem-EX system.
With a -desktop kernel I got a similar segfault (invalid opcode). Serial console output pasted in the end.

I installed a -default kernel on a SLE11 SP1 system and the machine booted fine (I didn't do longterm runtime or module loading tests).

I installed a -default kernel on the 11.4 system and the kexec crash kernel got booted after Yast finalized some scripts and tried to load quite some modules (e.g. all kind of fs drivers). Unfortunately the crash itself did not get printed to serial console.

As you pointed out that this may be restricted to specific HW (Nehelam/whatsoever) and ACPI (possible mem corruption), I wonder whether you have a container device included in your ACPI tables. My Nehalem machine has some and afaik these are not that widespread. You can double check by running the commands shown here:
ftp://ftp.suse.com/pub/people/trenn/How_to_check_for_ACPI_BIOS_syntax_bugs.txt
and do a grep on the disassembled ACPI tables:
grep "ACPI0004\|PNP0A05\|PNP0A06" *.dsl

If you get a hit, it would be interesting whether blacklisting the container driver in /etc/modprobe.conf (or /etc/modprobe.d/99-local.conf on newer systems) by adding:
blacklist container
helps.

Adding some Intel and ACPI people.


Segfault with -desktop kernel:
=======================================
fsck from util-linux-ng 2.18

[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda7 

STABLE: recovering journal

STABLE: clean, 202062/612000 files, 1236705/2445880 blocks

fsck succeeded. Mounting root device read-write.

Mounting root /dev/sda7

mount -o rw,acl,user_xattr -t ext4 /dev/sda7 /root

EXT4-fs (sda7): mounted filesystem with ordered data mode. Opts: acl,user_xattr

EXT4-fs (sda7): re-mounted. Opts: acl,user_xattr


INIT: version 2.88 booting


System Boot Control: Running /etc/init.d/boot

Mounting sysfs at /sys7[?25l
[80C[10D[1;32mdone[m8[?25h

Mounting debugfs at /sys/kernel/debug7[?25l
[80C[10D[1;32mdone[m8[?25h

Copying static /dev content7[?25l
[80C[10D[1;32mdone[m8[?25h

 ------------[ cut here ]------------

 kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.37/linux-2.6.36/kernel/timer.c:681!

 invalid opcode: 0000 [#1] PREEMPT SMP 

 last sysfs file: /sys/module/thermal_sys/sections/__ksymtab_strings

 CPU 2 

 Modules linked in: preloadtrace ext4 jbd2 crc16 fan thermal processor thermal_sys megaraid_sas

 

 Pid: 988, comm: stapio Not tainted 2.6.37-rc5-12-desktop #1 QSSC-S4R/QSSC-S4R

 RIP: 0010:[<ffffffff81068fce>]  [<ffffffff81068fce>] mod_timer+0x2fe/0x320

 RSP: 0018:ffff88047fc03d90  EFLAGS: 00010246

 RAX: 0000000000000000 RBX: ffff88047fc1af90 RCX: 000000004d2b255e

 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88047fc1af90

 RBP: 00000000fffbd23d R08: 000000004d2b255e R09: 0000000000383d81

 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffbd23d

 R13: ffff88047fc1af70 R14: 11f79a75476e5ff1 R15: ffff88047fc1af74

 FS:  00007f24cd642700(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000

 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

 CR2: 00007f24cc6b9ff8 CR3: 000000046ef46000 CR4: 00000000000006e0

 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

 Process stapio (pid: 988, threadinfo ffff88046e6f0000, task ffff88046e774200)

 Stack:

  ffff88026df74e10 0000000000000000 0000000000000282 0000000043491064

  000000d943491064 ffff88047fc1af70 11f79a75476e5ff1 ffffffffa00e3d4c

  000000004d2b255e 00000000176633f1 ffff88047fc159d0 0000000000000103

 Call Trace:

  [<ffffffffa00e3d4c>] __stp_time_timer_callback+0xac/0xd0 [preloadtrace]

  [<ffffffff81067ffe>] call_timer_fn+0x3e/0x190

  [<ffffffff81068394>] run_timer_softirq+0x144/0x2a0

  [<ffffffff8105f6da>] __do_softirq+0xda/0x2a0

  [<ffffffff81003eec>] call_softirq+0x1c/0x30

  [<ffffffff8100599d>] do_softirq+0x4d/0x80

  [<ffffffff8105f995>] irq_exit+0x95/0xa0

  [<ffffffff810200ba>] smp_apic_timer_interrupt+0x6a/0xa0

  [<ffffffff810039b3>] apic_timer_interrupt+0x13/0x20

 DWARF2 unwinder stuck at apic_timer_interrupt+0x13/0x20

 

 Leftover inexact backtrace:

 

  <IRQ> 

  <EOI> 

  [<ffffffff810254cb>] ? x2apic_send_IPI_allbutself+0x9b/0xb0

  [<ffffffff8101f312>] ? native_send_call_func_ipi+0xc2/0xd0

  [<ffffffff81090479>] ? smp_call_function_many+0x189/0x220

  [<ffffffffa00e3c00>] ? __stp_init_time+0x0/0xa0 [preloadtrace]

  [<ffffffff81090543>] ? smp_call_function+0x33/0x60

  [<ffffffff8105e82b>] ? on_each_cpu+0x3b/0x80

  [<ffffffffa00ebd95>] ? _stp_ctl_write_cmd+0x365/0x900 [preloadtrace]

  [<ffffffff81056f6a>] ? do_fork+0xea/0x410

  [<ffffffff81151a06>] ? vfs_write+0xc6/0x180

  [<ffffffff81151d1e>] ? sys_write+0x4e/0x90

  [<ffffffff810032f3>] ? stub_clone+0x13/0x20

  [<ffffffff8100300b>] ? system_call_fastpath+0x16/0x1b

  [<ffffffff8100300b>] ? system_call_fastpath+0x16/0x1b

 Code: 01 ec e9 5d fd ff ff 66 0f 1f 44 00 00 48 83 c4 10 41 bd 01 00 00 00 5b 5d 41 5c 44 89 e8 41 5d 41 5e c3 4d 89 e6 e9 19 fe ff ff <0f> 0b 48 8b 74 24 38 48 89 df e8 53 f6 ff ff e9 47 fd ff ff e8 

 RIP  [<ffffffff81068fce>] mod_timer+0x2fe/0x320

  RSP <ffff88047fc03d90>

 ---[ end trace be8a737ace40b96e ]---
Comment 34 Thomas Renninger 2011-01-11 12:35:29 UTC
> If you get a hit, it would be interesting whether blacklisting the container
> driver...
it's not that.
Comment 35 Bjorn Helgaas 2011-01-11 18:06:34 UTC
Every failure correlates very closely with loading the ACPI button driver.  Can you try to reproduce it without that driver at all (e.g., blacklist it or move the module out of the way)?

It sure feels like stack corruption in acpi_button_add(), but I've looked through that and don't see anything obvious.  If removing the button driver makes the problem go away, we could add it back in and comment out chunks of acpi_button_add() to identify the issue.
Comment 36 Thomas Renninger 2011-01-12 22:16:07 UTC
> Every failure correlates very closely with loading the ACPI button driver
Not every failure.
It looks as if we have two issues.
I do not have the button.ko issue. In fact button.ko is not even yet loaded on my system.
Same for segfaults shown in comment #6 and #9 (search for button in there you won't get a hit).
Unfortunately Rich's backtraces are somewhat cut off, but they are very equal to the one I posted in comment #33:
  - bad process: "comm: stapio"
  - "Bug on" triggered in kernel/timer.c: (line 681 for me, 2.6.37 and
    line 643 for Rich, 2.6.34)

I expect my issue is related to preloadtrace.ko. I'll open another bug and will point to it.
Comment 37 Thomas Renninger 2011-01-13 09:31:32 UTC
I created bug #664105 for the preloadtrace.ko issue.
Comment 38 Youquan Song 2011-01-24 11:50:29 UTC
How about disable KPROBE_EVENT config in openSUES11.3/4? then rebuild the kernel?
Comment 39 Bjorn Helgaas 2011-01-24 18:17:50 UTC
I reproduced the issue on a z400 with openSUSE 11.3.  It hit the issue after about a dozen reboots.

I tried again, this time with "brokemodules=button".  That rebooted 180 times, then hung with a blank screen so I have no more information.  This box has no serial ports, so I can't use a serial console.

I tried again, this time with "nopreload".  That rebooted 870 times without a problem, then hung here:

[    1.374091] radeon 0000:0f:00.0: VRAM: 256M 0x00000000 - 0x0FFFFFFF (256M use
d)
[    1.374191] radeon 0000:0f:00.0: GTT: 512M 0x10000000 - 0x2FFFFFFF
<hang>

So my guess is that bug 664105 is the main problem, but there seems to be something else going wrong too.
Comment 40 Thomas Renninger 2011-01-24 21:28:14 UTC
First: Thanks a lot for your efforts Bjorn!

> So my guess is that bug 664105 is the main problem, but there seems to be
> something else going wrong too.
I fear it's not bug 664105. This one took care about a NULL pointer dereference because of a race caused by SystemTap produced kernel code (at least that is what the changelog of the patch I added said which perfectly hit my and some of the backtraces here).

I think the main problem here is that we use a debug tool, namely SystemTap for production. Preloadtrace.ko is not compiled from C, but it's produced by a SystemTap compiler which is in first place made for debugging and not for production.

preloadtrace.ko should get rewritten in C or disabled by default.
I wonder whether it's needed to cache the files the preload package is talking about. It sounds (don't know) as if preloadtrace is simply used to track/trace booting for later analysis (and that is a valid use case for SystemTap as long as it's not enabled by default).

I better keep this bug open, if Rich still hits this with a 11.4 rc1 kernel, we should better disable preloadtrace.ko by default.