|
Bugzilla – Full Text Bug Listing |
| Summary: | xen kernel crash after about 16 hours, network stoped later/sometime disk control crash too | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.3 | Reporter: | Paul Pinault <disk_91> |
| Component: | Kernel | Assignee: | Jan Beulich <jbeulich> |
| Status: | VERIFIED NORESPONSE | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P4 - Low | CC: | ihno, tonyj |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 11.3 | ||
| Whiteboard: | |||
| Found By: | Community User | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
full kernel log since opensuse 11.3 install ; see after sept 23 for last kernel update
boot.msg normal kernel (no xen) boot.msg xen kernel First log booting Dom0 and crashing dom0 Second Log booting dom0 with acpi=on Third Log booting Dom0 with acpi=off |
||
|
Description
Paul Pinault
2010-09-25 15:48:01 UTC
It seems that this problem is more related with bridge : I changed my setup to stop using eth3 and use eth0 and eth1 instead. Now, my kernel is not crashing but the network is stoping on some interfaces ...
Additional information to help :
saturn:/home/disk # ifconfig
br0 Link encap:Ethernet HWaddr 00:48:54:67:E3:F9
inet adr:10.0.0.20 Bcast:10.0.0.255 Masque:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:668 errors:0 dropped:0 overruns:0 frame:0
TX packets:555 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:0
RX bytes:316246 (308.8 Kb) TX bytes:95004 (92.7 Kb)
br1 Link encap:Ethernet HWaddr 00:48:54:6F:78:AB
inet adr:10.0.1.20 Bcast:10.0.1.255 Masque:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:72 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:0
RX bytes:7516 (7.3 Kb) TX bytes:3769 (3.6 Kb)
eth0 Link encap:Ethernet HWaddr 00:48:54:67:E3:F9
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:744 errors:0 dropped:0 overruns:0 frame:0
TX packets:646 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:1000
RX bytes:335746 (327.8 Kb) TX bytes:108131 (105.5 Kb)
Interruption:10 Adresse de base:0xc000
eth1 Link encap:Ethernet HWaddr 00:48:54:6F:78:AB
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:45 errors:0 dropped:0 overruns:0 frame:0
TX packets:79 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:1000
RX bytes:2885 (2.8 Kb) TX bytes:11800 (11.5 Kb)
Interruption:11 Adresse de base:0x2000
lo Link encap:Boucle locale
inet adr:127.0.0.1 Masque:255.0.0.0
adr inet6: ::1/128 Scope:Hôte
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:87 errors:0 dropped:0 overruns:0 frame:0
TX packets:87 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:0
RX bytes:9508 (9.2 Kb) TX bytes:9508 (9.2 Kb)
vif1.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:264 errors:0 dropped:0 overruns:0 frame:0
TX packets:295 errors:0 dropped:1 overruns:0 carrier:0
collisions:0 lg file transmission:32
RX bytes:27623 (26.9 Kb) TX bytes:32612 (31.8 Kb)
vif1.1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:65 errors:0 dropped:0 overruns:0 frame:0
TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:32
RX bytes:7307 (7.1 Kb) TX bytes:1241 (1.2 Kb)
saturn:/home/disk # brctl show
bridge name bridge id STP enabled interfaces
br0 8000.00485467e3f9 no eth0
vif1.0
br1 8000.0048546f78ab no eth1
vif1.1
The configuration does not seems to be in cause as after a reboot everything is going well ... for a few time :(
Jan. Do you want to take a look at this since it's Xen related. Feel free to reassign back if not appropriate. Without seeing the full kernel log we can't really judge whether the netdev watchdog kicking in was just a secondary effect. Please attach the full /var/log/messages fragment(s) of the session(s) in question. (In reply to comment #3) > Without seeing the full kernel log we can't really judge whether the netdev > watchdog kicking in was just a secondary effect. Please attach the full > /var/log/messages fragment(s) of the session(s) in question. I put all what was interestin in the log, most of the time when crash, the log is empty (no more message that when the system work correctly) Do we have a way to get more log messages that could help ? (In reply to comment #4) > Do we have a way to get more log messages that could help ? Without knowing what we're looking for - no. > Without knowing what we're looking for - no.
I'm quite sure it is related with Bridge device as network traffic is the crash trigger.
Since I changed my config to use my two RTL ethernet cards instead of the MPC51 one, I have no log into /var/log/message but the network is still crashing : in most of the case, the internal network (between VM and Dom0) is working corectly but the external communication (Dom0 or VM communicating to an external machine) is not working. At this point I can type any command you want to get analysis.
In some other cases, the global server simply crash and I get no acces to anything (need to reboot) the /var/log/message have no messages related to this.
I can add this point (it may help) it seems that it appens each time on the eth with the higher number : initially eth3 ; now eth1 even when I switch eth0 and eth1 networks (by realocating br0 an eth0 and br1 on eth1 ) always eth1 crash
Then I can also add that is apears more frequently when I start a second VM ; in this case the br0 (eth0) is shared by 3 systems (Dom0, VM1, VM2) instead of 2 (Dom0 + VM1)
Hope this can help ...
Let me know what I can do to help to fix this
(In reply to comment #6) > I'm quite sure it is related with Bridge device as network traffic is the crash > trigger. Your newer setup is using bridging just like the older one (just on different NICs), so I can't see how you would want to distinguish the two. > Since I changed my config to use my two RTL ethernet cards instead of the MPC51 > one, I have no log into /var/log/message but the network is still crashing : in > most of the case, the internal network (between VM and Dom0) is working > corectly but the external communication (Dom0 or VM communicating to an > external machine) is not working. At this point I can type any command you want > to get analysis. > In some other cases, the global server simply crash and I get no acces to > anything (need to reboot) the /var/log/message have no messages related to > this. Again, we'll need a full log (up to and including any messages generated during an eventual full machine crash - those typically don't make it to persistent store, so you'll have to set up a serial console, at once allowing you to collect both kernel and hypervisor messages at the same time). Created attachment 392185 [details]
full kernel log since opensuse 11.3 install ; see after sept 23 for last kernel update
full kernel log as requested
looking for a serial cable to activate console trace ... it should be in place tonight Unfortunatly, no X serial cable :( ... will have to wait more for this, hope the kernel trace will help The log doesn't tell much, but at least it clarifies it's not the problem I was suspecting. Instead, especially the instance on Sep 16 suggest a more general interrupt handling problem, as a SATA device also suffered. Later instances with the 8139 don't, however - did you reconfigure the system in some way (e.g. was the interrupt shared originally, and now it isn't)? We'll need /var/log/boot.msg for both a native and a Xen kernel boot, and we'll need access to Xen's console (if the system is still usable once this state is reached, "xm debug-key" and "xm dmesg" command will do, but if it isn't a serial console is going to be unavoidable). One other thing to try would be passing "cpuidle=0" to Xen. And of course I assume you already installed the recently released Xen update, and know the issue is not solved by this. Finally, it would also be useful to know whether the latest kernel-of-the-day (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc based, but specifically with some rework of the interrupt handling) would help. (In reply to comment #11) > The log doesn't tell much, but at least it clarifies it's not the problem I was > suspecting. Instead, especially the instance on Sep 16 suggest a more general > interrupt handling problem, as a SATA device also suffered. Later instances > with the 8139 don't, however - did you reconfigure the system in some way (e.g. > was the interrupt shared originally, and now it isn't)? I did not changed anything like this ; just change my network config to get my system stable for a longer time. SATA was a second side effect, when it crashed, firstly eth3 crashed, then I stoped & restard it ; it worked some time then SATA crashed ... but has you say this seems not to be the root cause, they are side effects on something else. > We'll need /var/log/boot.msg for both a native and a Xen kernel boot, ok, i'll provide this > and we'll > need access to Xen's console (if the system is still usable once this state is > reached, "xm debug-key" and "xm dmesg" command will do, but if it isn't a > serial console is going to be unavoidable). When only network is crashed, the VM continue to work well but w/o external network (internal network with dom0 continue to work) ... until the Dom0 crash. > One other thing to try would be passing "cpuidle=0" to Xen. And of course I > assume you already installed the recently released Xen update, and know the > issue is not solved by this. All the systems : Dom0 and VMs are patched with the latest version of each systms, I have Opensuse 11.3 as Dom0 and Opensuse 11.1 and Opensuse 11.2 as VMs cpuidle=0 : ok I will chnage this > Finally, it would also be useful to know whether the latest kernel-of-the-day > (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc > based, but specifically with some rework of the interrupt handling) would help. Something possible to do after the others test ... no pbm... I hope to find a serial cable for this weekend to be able to reproduce with all log info .. Created attachment 392393 [details]
boot.msg normal kernel (no xen)
Normal boot.msg log
Created attachment 392394 [details]
boot.msg xen kernel
boot.msg xen kernel log file
The serial cable is in place ... start capturating logs ... Created attachment 392409 [details]
First log booting Dom0 and crashing dom0
This log has been get from serial console. It boots the Xen kernel, start a VM, start a second VM manually , then I crash the system by generating a NFS transfer on BR0/eth0 (it takes less than 5 min to crash) at this point of time I was not able to use the system anymore (no keyboard, no mouse .. screen up but frozen) had to reset.
Created attachment 392410 [details]
Second Log booting dom0 with acpi=on
As usually I boot with acpi=off, i changed this (removing the option), xen kernel start booting but crashed before the end of the boot ... log contains more information on crash.
Created attachment 392411 [details]
Third Log booting Dom0 with acpi=off
Third test : back to acpi=off, so the context is the same as in the first log, but this time I was not able to finish to boot before crash appends.
Right now the fourth log is in progress, I just reboot and the system finished to boot correctly (as in Log1) (compare to log2 and log3 I did a switch on/off of the machine instead of just using the reset button)
I will try to crash it differently to be able to get keyboard access to type the xm dmesg command
Other testing done tonight ... - I'm able to make it crash easily just by generating traffic on any interface - I'm actually not able to access console when crashed to type xm dmesg or simply dmesg ... may be later - during each "home made" crash I did not see any interesting logs on console - after crash I usually get Input/Output error on any command (including dmesg), sometime I don't have keyboard, sometime I have - Normal kernel is stable ( I transfered about 12G w/o any issue when I never transfer more than 2G on Xen kernel (interresting limit ..) but generally less is sufficient) - Actually I boot my system with acpi=on and it works as bad as acpi=off - I'll try latest kernel version ... no more test idea as nothing interesting in the log I see... I hope you will decode the matrix in the one I attached today. > Finally, it would also be useful to know whether the latest kernel-of-the-day > (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc > based, but specifically with some rework of the interrupt handling) would help. Only found ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/kernel-xen-debuginfo-2.6.34.7-0.3.99.8.0873825.x86_64.rpm But try that one ... expecting more trace ! So, to finish test tonight : I choose the kernel of the day and it crash exactly the same way :( saturn:/home/disk # uname -a Linux saturn 2.6.34.7-0.3.99.8.0873825-xen #1 SMP 2010-09-27 20:56:41 +0200 x86_64 x86_64 x86_64 GNU/Linux When crash I got the following elements: Sep 30 22:51:44 saturn kernel: [ 201.068023] alloc kstat_irqs on node 0 Sep 30 22:52:37 saturn kernel: [ 254.200733] br0: port 3(vif2.0) entering disabled state Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: offline XENBUS_PATH=backend/vif/2/0 Sep 30 22:52:37 saturn kernel: [ 254.220087] br0: port 3(vif2.0) entering disabled state Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: brctl delif br0 vif2.0 failed Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: ifconfig vif2.0 down failed Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for vif2.0, bridge br0. Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vkbd/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/console/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vfb/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51712 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vif/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51728 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51760 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/2/51760 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/2/51728 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/2/51712 Sep 30 22:55:01 saturn /usr/sbin/cron[6050]: (root) CMD (/opt/stats/execstat.sh > /dev/null) Sep 30 22:55:28 saturn kernel: [ 425.812012] ------------[ cut here ]------------ Sep 30 22:55:28 saturn kernel: [ 425.812023] WARNING: at /usr/src/packages/BUILD/kernel-xen-2.6.34.7/linux-2.6.34/net/sched/sch_generic.c:256 dev_watchdog+0x25b/0x270() Sep 30 22:55:28 saturn kernel: [ 425.812025] Hardware name: System Product Name Sep 30 22:55:28 saturn kernel: [ 425.812027] NETDEV WATCHDOG: eth1 (8139too): transmit queue 0 timed out Sep 30 22:55:28 saturn kernel: [ 425.812029] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit usbbk gntdev netbk blkbk blkback_pagemap blktap domctl hwmon_vid xenbus_be snd_pcm_oss evtchn snd_mixer_oss coretemp snd_seq snd_seq_device edd nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bridge stp llc ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop snd_hda_codec_realtek firewire_ohci firewire_core crc_itu_t snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer ohci1394 snd usbhid 8139too soundcore ppdev 8250_pnp hid usblp ieee1394 8139cp forcedeth pcspkr shpchp i2c_nforce2 snd_page_alloc parport_pc sg 8250 sr_mod pci_hotplug parport serial_core floppy asus_atk0110 ext4 jbd2 crc16 dm_mirror dm_region_hash dm_log nouveau ttm drm_kms_helper ohci_hcd drm agpgart i2c_algo_bit i2c_core ehci_hcd sd_m Sep 30 22:55:28 saturn kernel: od usbcore button dm_snapshot dm_mod xenblk cdrom xennet fan processor ata_generic pata_amd sata_nv libata scsi_mod thermal thermal_sys hwmon Sep 30 22:55:28 saturn kernel: [ 425.812110] Pid: 0, comm: swapper Not tainted 2.6.34.7-0.3.99.8.0873825-xen #1 Sep 30 22:55:28 saturn kernel: [ 425.812128] [<ffffffff8040a79b>] dump_stack+0x69/0x6f Sep 30 22:55:28 saturn kernel: [ 425.812134] [<ffffffff80043943>] warn_slowpath_common+0x73/0xb0 Sep 30 22:55:28 saturn kernel: [ 425.812138] [<ffffffff800439e0>] warn_slowpath_fmt+0x40/0x50 Sep 30 22:55:28 saturn kernel: [ 425.812142] [<ffffffff8034d04b>] dev_watchdog+0x25b/0x270 Sep 30 22:55:28 saturn kernel: [ 425.812149] [<ffffffff80053d34>] run_timer_softirq+0x1d4/0x3d0 Sep 30 22:55:28 saturn kernel: [ 425.812154] [<ffffffff8004b8c8>] __do_softirq+0xe8/0x220 Sep 30 22:55:28 saturn kernel: [ 425.812159] [<ffffffff80007efc>] call_softirq+0x1c/0x30 Sep 30 22:55:28 saturn kernel: [ 425.812163] [<ffffffff80009595>] do_softirq+0xa5/0xe0 Sep 30 22:55:28 saturn kernel: [ 425.812168] [<ffffffff8004bafd>] irq_exit+0x8d/0xa0 Sep 30 22:55:28 saturn kernel: [ 425.812174] [<ffffffff802d27d2>] evtchn_do_upcall+0x222/0x270 Sep 30 22:55:28 saturn kernel: [ 425.812179] [<ffffffff80007a4e>] do_hypervisor_callback+0x1e/0x30 Sep 30 22:55:28 saturn kernel: [ 425.812190] [<ffffffff800033aa>] 0xffffffff800033aa Sep 30 22:55:28 saturn kernel: [ 425.812199] [<ffffffff80009c0c>] xen_safe_halt+0xc/0x10 Sep 30 22:55:28 saturn kernel: [ 425.812202] [<ffffffff8000e763>] xen_idle+0x43/0xc0 Sep 30 22:55:28 saturn kernel: [ 425.812207] [<ffffffff80005255>] cpu_idle+0x55/0xa0 Sep 30 22:55:28 saturn kernel: [ 425.812213] [<ffffffff80761b0a>] start_kernel+0x3d2/0x3dd Sep 30 22:55:28 saturn kernel: [ 425.812216] ---[ end trace b6b372b1b3719054 ]--- Sep 30 22:55:31 saturn kernel: [ 428.812028] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1 Sep 30 22:59:13 saturn shutdown[6123]: shutting down for system halt Sep 30 22:59:13 saturn init: Switching to runlevel: 0 Sep 30 22:59:19 saturn sshd[3412]: Received signal 15; terminating. Sep 30 22:59:19 saturn avahi-daemon[3592]: Leaving mDNS multicast group on interface br1.IPv4 with address 10.0.1.20. Sep 30 22:59:19 saturn avahi-daemon[3592]: Leaving mDNS multicast group on interface br0.IPv4 with address 10.0.0.20. Sep 30 22:59:19 saturn auditd[3340]: The audit daemon is exiting. Sep 30 22:59:19 saturn smartd[4254]: smartd received signal 15: Terminated Sep 30 22:59:19 saturn smartd[4254]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.ST3250620AS-5QF15S8C.ata.state Sep 30 22:59:19 saturn smartd[4254]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.ST3250620AS-9QE06V9D.ata.state Sep 30 22:59:19 saturn smartd[4254]: smartd is exiting (exit status 0) Sep 30 22:59:19 saturn gnome-keyring-daemon[4985]: dbus failure unregistering from session: Connection is closed Sep 30 22:59:19 saturn gnome-keyring-daemon[4985]: dbus failure unregistering from session: Connection is closed Sep 30 22:59:19 saturn polkitd(authority=local): Unregistered Authentication Agent for session /org/freedesktop/ConsoleKit/Session2 (system bus name :1.56, object path /org/gnome/PolicyKit1/AuthenticationAgent, locale fr_FR.utf8) (disconnected from bus) Sep 30 22:59:19 saturn kernel: [ 656.464725] [drm] nouveau 0000:03:00.0: nouveau_channel_free: freeing fifo 2 On the VM side, I got mm.c 799:d2 non-privileged(2) attenpt tp map I/O space 0000...f0 Hope it will help ... (In reply to comment #16) > Created an attachment (id=392409) [details] > First log booting Dom0 and crashing dom0 Did you see "(XEN) APIC error on CPU3: 00(40)"? Are you having problems with your hardware? (In reply to comment #17) > Created an attachment (id=392410) [details] > Second Log booting dom0 with acpi=on > > As usually I boot with acpi=off, i changed this (removing the option), xen > kernel start booting but crashed before the end of the boot ... log contains > more information on crash. The log here is completely meaningless. You pressed arbitrary keys on the serial console (or the remote end sent them without you asking for them) - one can't even tell whether the box was hung, or how far the boot progressed. BUT: if you think you need to disable ACPI, that may be part of your problem. I have yet to understand why you need to... (In reply to comment #18) > Third test : back to acpi=off, so the context is the same as in the first log, > but this time I was not able to finish to boot before crash appends. Just like for the previous one - there's no evidence that the box crashed, you just had it print huge piles of information. If you didn't ask for it yourself, you'll need to tweak your "other end" of the serial cable (also indicated by the extra blank lines inserted, which make the logs quite hard to read). > I will try to crash it differently to be able to get keyboard access to type > the xm dmesg command No need for "xm dmesg" once you have a serial cable. You get all messages there, and you issue debug keys from the serial console (after switching input to Xen). (In reply to comment #20) > Only found > ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/kernel-xen-debuginfo-2.6.34.7-0.3.99.8.0873825.x86_64.rpm > But try that one ... expecting more trace ! Sorry, I really intended to direct you to ftp://ftp.suse.com/pub/projects/kernel/kotd/master/x86_64/. Turning off ACPI only for Xen makes things even more suspicious. What's the deal here? Also, can you reproduce your problems on other, very different hardware? Finally, one thing you definitely want to try is disabling the use of the nouveau driver in the Xen case. (In reply to comment #22) > (In reply to comment #16) > > Created an attachment (id=392409) [details] [details] > > First log booting Dom0 and crashing dom0 > > Did you see "(XEN) APIC error on CPU3: 00(40)"? Are you having problems with > your hardware? I don't think so, system is not crashing when I choose a non xen kernel. CPU is a fresh one never overclocked of something like this. I had a problem with a previous motherboard but I had the problem before and I continue to have it ... (In reply to comment #23) > (In reply to comment #17) > > Created an attachment (id=392410) [details] [details] > > Second Log booting dom0 with acpi=on > > > > As usually I boot with acpi=off, i changed this (removing the option), xen > > kernel start booting but crashed before the end of the boot ... log contains > > more information on crash. > > The log here is completely meaningless. You pressed arbitrary keys on the > serial console (or the remote end sent them without you asking for them) - one > can't even tell whether the box was hung, or how far the boot progressed. > > BUT: if you think you need to disable ACPI, that may be part of your problem. I > have yet to understand why you need to... In fact with or without acpi it does not change anything, what I detect is that acpi with a slow serial console in crashing, here, i don't kno why but my remote uart is set at 9600bps and can't be set to a higher baudrate. At the baudrate I can't boot the acpi on xen kernel ... I do not this this log is really interesting regarding the network problem ; it was in case of .. > Turning off ACPI only for Xen makes things even more suspicious. What's the > deal here? The deal was to be able to detect my sensors but right now, acpi is on and my sensors worked well so I removed acpi=off. This does not affect the crash (that was the purpose of the different test - unvalidate this setting impact) > Also, can you reproduce your problems on other, very different hardware? I do not have other hadware actually available for this. > Finally, one thing you definitely want to try is disabling the use of the > nouveau driver in the Xen case. I do not understand what you mean by this. what is the "nouveau driver" ? > > Finally, one thing you definitely want to try is disabling the use of the
> > nouveau driver in the Xen case.
> I do not understand what you mean by this. what is the "nouveau driver" ?
Sorry ... got it ! I'll try asap.
Tonight test - To blacklist nouveau ... I tryed to add "blacklist nouveau" into /etc/modprobe.d/50-blacklist.conf and 00-system.conf ... after a reboot, nouveau module is still here ... so any idea to blacklist it really ? other than moving nouveau.ko out of the fs ? - kernel patch to 2.6.36 The system crashed ... this time ata1 then ata2 crashed ... i'll try to attach log file tomorrow For your information I finished to migrate my VM from xen to qemu-kvm ... now, the system look stable : all vm running in parallel and actually worked well. I'm still able to reproduce the crash if you need my assistance to fix it. Still missing the log promised in #31.
Also please try disabling IRQ balancing in Xen ("noirqbalance" on the Xen command line) and/or in Linux (disabling the irq balance daemon in case it is enabled).
(In reply to comment #33) > Still missing the log promised in #31. Nothing attached as the log is the same as the previous one. Nothing to see on it. > Also please try disabling IRQ balancing in Xen ("noirqbalance" on the Xen > command line) and/or in Linux (disabling the irq balance daemon in case it is > enabled). Next time i will have to reboot my kvmqemu config i will do the test... Actually it works since at least one month. Not sure to answer quickly. (In reply to comment #34) > Nothing attached as the log is the same as the previous one. Nothing to see on > it. How that, if now you don't load the nouveau driver, while previously you did? Ping? No response in over half a year. Feel free to re-open if you're ready to continue providing necessary information. |