Bugzilla – Bug 570443
CRASH @ attempt to manually remove vcpus from dom0 using vcpu-set
Last modified: 2010-01-14 08:19:52 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.0) Gecko/20100105 SUSE/3.6rc1-1.2 Firefox/3.6 @ attempt to manually remove vcpus from dom0 using vcpu-set, xm vcpu-list Domain-0 Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 27.7 0 Domain-0 0 1 1 -b- 9.2 1 Domain-0 0 2 2 -b- 14.1 2 Domain-0 0 3 3 -b- 14.3 3 xm vcpu-set --help Usage: xm vcpu-set <Domain> <vCPUs> Set the number of active VCPUs for allowed for the domain. xm vcpu-set Domain-0 1 ==> xen/xend.log <== [2010-01-13 10:48:16 4953] INFO (XendDomainInfo:1818) Set VCPU count on domain Domain-0 to 1 xm vcpus-list Domain-0 this hangs the current session. checking @ Dom0, top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6112 root 15 -5 0 0 0 R 100 0.0 0:29.92 xenwatch_cb "xenwatch_cb" is hogging 100% cpu. then, kill -9 6112 recovers. checking syslog, ==> messages <== Jan 13 10:49:09 test kernel: [ 1568.737072] BUG: soft lockup - CPU#3 stuck for 61s! [xenwatch_cb:6112] ... Jan 13 10:49:09 test kernel: [ 1569.275176] CPU 3: ... Jan 13 10:49:09 test kernel: [ 1569.899143] Pid: 6112, comm: xenwatch_cb Not tainted 2.6.31.8-0.1-xen #1 System Product Name Jan 13 10:49:09 test kernel: [ 1569.991141] RIP: e030:[<ffffffff8005ef0f>] [<ffffffff8005ef0f>] lock_timer_base+0x7f/0x90 Jan 13 10:49:09 test kernel: [ 1570.087131] RSP: e02b:ffff88003f38dc10 EFLAGS: 00000246 Jan 13 10:49:09 test kernel: [ 1570.179128] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8077a3b0 Jan 13 10:49:09 test kernel: [ 1570.275121] RDX: 0000000000000001 RSI: ffff88003f38dc50 RDI: ffffc90000015280 Jan 13 10:49:09 test kernel: [ 1570.367115] RBP: ffff88003f38dc40 R08: ffffffff807833f0 R09: 0000000000000000 Jan 13 10:49:09 test kernel: [ 1570.459110] R10: ffff88003f38dcf0 R11: 000000008141ce5c R12: ffffc90000015280 Jan 13 10:49:09 test kernel: [ 1570.551106] R13: ffff88003f38dc50 R14: 0000000000000000 R15: ffffffff8077a640 Jan 13 10:49:09 test kernel: [ 1570.639105] FS: 00007f8fefd696f0(0000) GS:ffffc90000030000(0000) knlGS:0000000000000000 Jan 13 10:49:09 test kernel: [ 1570.727098] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 13 10:49:09 test kernel: [ 1570.815091] CR2: 0000000000b781e8 CR3: 0000000000003000 CR4: 0000000000000660 Jan 13 10:49:09 test kernel: [ 1570.903088] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 13 10:49:09 test kernel: [ 1570.991081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 13 10:49:09 test kernel: [ 1571.079077] Call Trace: Jan 13 10:49:09 test kernel: [ 1571.167076] [<ffffffff8005ef4c>] try_to_del_timer_sync+0x2c/0x90 Jan 13 10:49:09 test kernel: [ 1571.255067] [<ffffffff8005efda>] del_timer_sync+0x2a/0x50 Jan 13 10:49:09 test kernel: [ 1571.339062] [<ffffffff80467adf>] mce_cpu_callback+0x122/0x1aa Jan 13 10:49:09 test kernel: [ 1571.423058] [<ffffffff80472337>] notifier_call_chain+0x57/0xb0 Jan 13 10:49:09 test kernel: [ 1571.507055] [<ffffffff8007585c>] __raw_notifier_call_chain+0x1c/0x40 Jan 13 10:49:09 test kernel: [ 1571.591050] [<ffffffff8045be5f>] _cpu_down+0xaf/0x310 Jan 13 10:49:09 test kernel: [ 1571.671045] [<ffffffff8045c147>] cpu_down+0x87/0xb0 Jan 13 10:49:09 test kernel: [ 1571.751040] [<ffffffff8046a97c>] vcpu_hotplug+0xce/0x102 Jan 13 10:49:09 test kernel: [ 1571.831036] [<ffffffff8046a9fb>] handle_vcpu_hotplug_event+0x4b/0x61 Jan 13 10:49:09 test kernel: [ 1571.907026] [<ffffffff803070fc>] xenwatch_handle_callback+0x2c/0x80 Jan 13 10:49:09 test kernel: [ 1571.979030] [<ffffffff8006f9d6>] kthread+0xb6/0xc0 Jan 13 10:49:09 test kernel: [ 1572.051024] [<ffffffff8000d38a>] child_rip+0xa/0x20 Reproducible: Always Steps to Reproduce: 1. 2. 3.
correction. once hung, even "kill -9" is ignored, ps ax | grep 6112 6112 ? R< 47:48 [xenwatch_cb] 6319 pts/0 R<+ 0:00 grep 6112 kill -9 6112 ps ax | grep 6112 6112 ? R< 47:53 [xenwatch_cb] 6321 pts/0 S<+ 0:00 grep 6112 ps ax | grep xenwatch 22 ? S< 0:00 [xenwatch] 6112 ? R< 51:01 [xenwatch_cb] 6113 ? D< 0:00 [xenwatch_cb] 6114 ? D< 0:00 [xenwatch_cb] 6331 pts/0 S<+ 0:00 grep xenwatch kill -9 22 6112 6113 6114 ps ax | grep xenwatch 22 ? S< 0:00 [xenwatch] 6112 ? R< 51:18 [xenwatch_cb] 6113 ? D< 0:00 [xenwatch_cb] 6114 ? D< 0:00 [xenwatch_cb] 6334 pts/0 R<+ 0:00 grep xenwatch reboot's required :-(
Workaround until kernel update becomes available is specifying mce=0 on the kernel command line. *** This bug has been marked as a duplicate of bug 558663 ***