Bugzilla – Bug 551695
Xen dom0 crashes when domU uses phy: lvm volume as disk - Xen unusable!
Last modified: 2009-12-15 14:55:13 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.14) Gecko/2009090900 SUSE/3.0.14-0.1.2 Firefox/3.0.14 Since yesterday i get these messages: Bad page state in process 'syslog-ng' Nov 1 08:37:13 willie kernel: page:ffff88000a8f0f00 flags:0x8000000000000800 mapping:0000000000000000 mapcount:0 count:0 Nov 1 08:37:13 willie kernel: Trying to fix it up, but a reboot is needed Nov 1 08:37:13 willie kernel: Backtrace: Nov 1 08:37:13 willie kernel: Pid: 3292, comm: syslog-ng Tainted: G B 2.6.27.37-0.1-xen #1 Nov 1 08:37:13 willie kernel: Nov 1 08:37:13 willie kernel: Call Trace: Nov 1 08:37:13 willie kernel: [<ffffffff8020c597>] show_trace_log_lvl+0x41/0x58 Nov 1 08:37:13 willie kernel: [<ffffffff80464df3>] dump_stack+0x69/0x6f Nov 1 08:37:13 willie kernel: [<ffffffff8027835e>] bad_page+0x90/0xbd Nov 1 08:37:13 willie kernel: [<ffffffff802787db>] free_hot_cold_page+0xa0/0x232 Nov 1 08:37:13 willie kernel: [<ffffffff803f0495>] skb_release_data+0x6d/0xe9 Nov 1 08:37:13 willie kernel: [<ffffffff803f0381>] __kfree_skb+0x9/0x6f Nov 1 08:37:13 willie kernel: [<ffffffff804220d1>] tcp_recvmsg+0x780/0xafc Nov 1 08:37:13 willie kernel: [<ffffffff803eba97>] sock_common_recvmsg+0x30/0x45 Nov 1 08:37:13 willie kernel: [<ffffffff803e9adc>] sock_aio_read+0x12c/0x149 Nov 1 08:37:13 willie kernel: [<ffffffff8029e974>] do_sync_read+0xce/0x113 Nov 1 08:37:13 willie kernel: [<ffffffff8029f381>] vfs_read+0xbd/0x153 Nov 1 08:37:13 willie kernel: [<ffffffff8029f4d3>] sys_read+0x45/0x6e Nov 1 08:37:13 willie kernel: [<ffffffff8020b3b8>] system_call_fastpath+0x16/0x1b Nov 1 08:37:13 willie kernel: [<00007f9b536c5ef0>] 0x7f9b536c5ef0 Kernel is Linux willie 2.6.27.37-0.1-xen #1 SMP 2009-10-15 14:56:58 +0200 x86_64 x86_64 x86_64 GNU/Linux and machine is a xen dom0 running 4 domus that all log syslog to dom0's syslog-ng. Did run before for months without such a bug. Machine crashed tonight. Recent changes: updated kernel and added a budget DVB-S card. Is this a hardware bug or a software bug? Reproducible: Didn't try Steps to Reproduce: 1. 2. 3.
Re-tried: At the moment i'm only getting this during boot. But then i get some or many errors (2-200). From the logs i can say that the page is always different and it is always syslog-ng.
No one else hits this bug? Today i get the messages also during normal operation.
OK, i upgraded to OpenSuse 11.2. The System crashed 15 Minutes after reboot followed the upgrade. But no "bad page state"s in the log. So i resetted and got ~ 150 Bad page state messages. Kernel is now: 2.6.31.5-0.1-xen x86_64 Error messages - all the same - read: Dec 5 00:28:25 willie kernel: [ 452.797839] BUG: Bad page state in process syslog-ng pfn:1dc51d Dec 5 00:28:25 willie kernel: [ 452.797842] page:ffff88000a177ed8 flags:8000000000000800 count:0 mapcount:0 mapping:(null) index:0 Dec 5 00:28:25 willie kernel: [ 452.797844] Pid: 2434, comm: syslog-ng Tainted: G B 2.6.31.5-0.1-xen #1 Dec 5 00:28:25 willie kernel: [ 452.797846] Call Trace: Dec 5 00:28:25 willie kernel: [ 452.797849] [<ffffffff800119b9>] try_stack_unwind+0x189/0x1b0 Dec 5 00:28:25 willie kernel: [ 452.797854] [<ffffffff8000f466>] dump_trace+0xa6/0x1e0 Dec 5 00:28:25 willie kernel: [ 452.797857] [<ffffffff800114c4>] show_trace_log_lvl+0x64/0x90 Dec 5 00:28:25 willie kernel: [ 452.797861] [<ffffffff80011513>] show_trace+0x23/0x40 Dec 5 00:28:25 willie kernel: [ 452.797865] [<ffffffff8046af06>] dump_stack+0x81/0x9e Dec 5 00:28:25 willie kernel: [ 452.797868] [<ffffffff800dacb5>] bad_page+0xf5/0x160 Dec 5 00:28:25 willie kernel: [ 452.797872] [<ffffffff800dcf94>] free_hot_cold_page+0xa4/0x2b0 Dec 5 00:28:25 willie kernel: [ 452.797876] [<ffffffff800dd26e>] free_hot_page+0x1e/0x40 Dec 5 00:28:25 willie kernel: [ 452.797880] [<ffffffff800e10d7>] put_page+0x57/0x150 Dec 5 00:28:25 willie kernel: [ 452.797884] [<ffffffff802ff7cb>] gnttab_page_free+0x3b/0x60 Dec 5 00:28:26 willie kernel: [ 452.797888] [<ffffffff800dcf47>] free_hot_cold_page+0x57/0x2b0 Dec 5 00:28:26 willie kernel: [ 452.797892] [<ffffffff800dd26e>] free_hot_page+0x1e/0x40 Dec 5 00:28:26 willie kernel: [ 452.797896] [<ffffffff800e10d7>] put_page+0x57/0x150 Dec 5 00:28:26 willie kernel: [ 452.797900] [<ffffffff803b5f1c>] skb_release_data+0x8c/0x100 Dec 5 00:28:26 willie kernel: [ 452.797904] [<ffffffff803b5848>] __kfree_skb+0x28/0xd0 Dec 5 00:28:26 willie kernel: [ 452.797908] [<ffffffff80400ea8>] sk_eat_skb+0x78/0x90 Dec 5 00:28:26 willie kernel: [ 452.797911] [<ffffffff804040a6>] tcp_recvmsg+0x8e6/0xda0 Dec 5 00:28:26 willie kernel: [ 452.797915] [<ffffffff803b0183>] sock_common_recvmsg+0x43/0x70 Dec 5 00:28:26 willie kernel: [ 452.797919] [<ffffffff803ace69>] sock_aio_read+0x169/0x180 Dec 5 00:28:26 willie kernel: [ 452.797923] [<ffffffff80118c52>] do_sync_read+0x102/0x160 Dec 5 00:28:26 willie kernel: [ 452.797927] [<ffffffff80119251>] vfs_read+0x1a1/0x1c0 Dec 5 00:28:26 willie kernel: [ 452.797931] [<ffffffff801198ab>] sys_read+0x5b/0xa0 Dec 5 00:28:26 willie kernel: [ 452.797935] [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b Dec 5 00:28:26 willie kernel: [ 452.797940] [<00007f5a1525cdc0>] 0x7f5a1525cdc0 Machine crashed again while writing these lines. I decided to plug a keyboard and screen to have a chance to see why it dies ...
Machine still crashes 5-6 times a day - it worked months with opensuse 11.1 old kernel, Weeks with 11.1 new kernel and hours with 11.2 without crash. I added a screen shot of the CTRL-F10 console after crash. I just installed kdump packages and try to get it running - any hints how to get a kernel crash dump?
Created attachment 331213 [details] screen shot of crash dump
News - if anyone is interested .... Crash can be reproduced! If i try to copy a big file from a domU machine via NFS (v4) to the dom0 machine. Crash dumps always have a new face. Bug in xen or bridging code? Or AHCI drivers? What can i do?
Now i got 3 times a crash message with: mp bios bug: 8254 timer not connected to IO-APIC noapic makes the kernel unbootable. board is a biostar TA780G M2+ - if this has anything to do with the crash ...
Update: Boosting the virtual bridge has no effect. But when i read a file in a domU i get the crash again. Setup: dom0: Opensuse 11.2 64bit domUs: Opensuse 11.1 64bit pvm Disks are mirrored with md0 /boot and md1 pv od a lvm group. root and swap of dom0 are lvm volumes. The disks of the domUs are also lvm volumes. No lvm in domU :-) is this setup OK? It worked as long as i had 11.1 at dom0
Set up another machine - totally different hardware. With Opensuse 11.2. Created LVM volume and tried to use it as physical disk for a new domU installation. Installation starts, but dom0 freezes short after. dom0 has all patches applied. I suspect the same bug. DomU had some kernel messages in boot screen - looked like the ones i've seen all over the time. This makes Xen totally unusable in 11.2!
Installing domU on a file instead of a LVM volume works ...
Very likely a duplicate of 553690 and 559047. Please report whether mem=4G also allows you to work around the issue (apart from your finding of using file:/).
I managed to install a pv guest using mem=4G. But the guest still spits many kernel messages. I could also start this dom0 without using "mem=4G" but it crashed later dom0. Kernel messages from domU during boot: There are 122 Kernel errors. 118 from swapper followed by 4 from modprobe. The swapper messages: See attachment.
Created attachment 331326 [details] 2x boot of domU
Please see bug 559047 in case you want to try out a potential fix for this.
. *** This bug has been marked as a duplicate of bug 553690 ***