Bugzilla – Bug 438610
Call Trace with XEN-Kernel on Dell PowerEdge 2950 => onboard network "bnx2" is not working
Last modified: 2008-12-22 09:30:05 UTC
Hi, i get a kernel "Call Trace" on my Dell PowerEdge 2950, if i use the XEN-Kernel. Sometimes, my onboard network goes down. Probably this has a correlation! # less /var/log/messages Oct 22 13:25:34 xensrv2 kernel: ------------[ cut here ]------------ Oct 22 13:25:34 xensrv2 kernel: WARNING: at arch/x86/mm/pageattr-xen.c:622 __change_page_attr+0x67/0x25b() Oct 22 13:25:34 xensrv2 kernel: CPA: called for zero pte. vaddr = ffff8800f007b000 cpa->vaddr = ffff8800f007b000 Oct 22 13:25:34 xensrv2 kernel: Modules linked in: bridge stp netbk blkbk blktap xenbus_be ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NO TRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack ip_tables ip6ta ble_filter ip6_tables x_tables ipv6 microcode fuse loop dm_mod usbhid hid rtc_cmos rtc_core pcspkr ff_memless ide_cd_mod serio_raw rtc_lib 8250_pnp dcdbas(X) bnx2 ses iTCO_wdt button 8250 iTCO_vendor_support serial_core igb enclosure shpchp pci_hotplug i5000_edac edac_core sg e hci_hcd uhci_hcd usbcore sd_mod crc_t10dif xenblk cdrom xennet edd reiserfs fan ide_pci_generic ata_generic ata_piix pata_acpi libata dock piix ide_core lpfc scsi_transport_fc scsi_tgt megaraid_sas scsi_mod thermal processor thermal_sys hwmon Oct 22 13:25:34 xensrv2 kernel: Supported: Yes, External Oct 22 13:25:34 xensrv2 kernel: Pid: 4191, comm: X Tainted: G 2.6.27.1-2-xen #1 Oct 22 13:25:34 xensrv2 kernel: Oct 22 13:25:34 xensrv2 kernel: Call Trace: Oct 22 13:25:34 xensrv2 kernel: [<ffffffff8020ba57>] show_trace_log_lvl+0x41/0x58 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff8045cc08>] dump_stack+0x69/0x6f Oct 22 13:25:34 xensrv2 kernel: [<ffffffff802312f1>] warn_slowpath+0xa9/0xd1 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff80218d4a>] __change_page_attr+0x67/0x25b Oct 22 13:25:34 xensrv2 kernel: [<ffffffff80218f5b>] __change_page_attr_set_clr+0x1d/0x53 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff802191c4>] change_page_attr_set_clr+0xd0/0x200 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff803df1e2>] pci_mmap_page_range+0xe5/0x149 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff802eb0c6>] mmap+0x5d/0x99 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff80288872>] mmap_region+0x2a1/0x4e8 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff80288da0>] do_mmap_pgoff+0x2e7/0x34b Oct 22 13:25:34 xensrv2 kernel: [<ffffffff8020e833>] sys_mmap+0x8c/0xc5 Oct 22 13:25:34 xensrv2 kernel: [<ffffffff8020a878>] system_call_fastpath+0x16/0x1b Oct 22 13:25:34 xensrv2 kernel: [<00007fc4623eb88a>] 0x7fc4623eb88a Oct 22 13:25:34 xensrv2 kernel: Oct 22 13:25:34 xensrv2 kernel: ---[ end trace 67bf12ace8cdfb26 ]--- # uname -a Linux xendmz2 2.6.27.1-2-xen #1 SMP 2008-10-16 20:35:15 +0200 x86_64 x86_64 x86_64 GNU/Linux # lsmod | grep bnx2 bnx2 182280 0 # lspci | grep -i Ethernet 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) 07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) 0c:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 0c:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 0d:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 0d:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) Thanks Oliver Mössinger
Hi, next kernel trace on the same host: Oct 24 12:55:17 xendmz2 kernel: Bridge firewalling registered Oct 24 12:55:17 xendmz2 kernel: tmpbridge: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature. Oct 24 12:55:17 xendmz2 kernel: eth0 renamed to peth0 Oct 24 12:55:17 xendmz2 kernel: tmpbridge renamed to eth0 Oct 24 12:55:17 xendmz2 kernel: igb 0000:0c:00.0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Oct 24 12:55:17 xendmz2 kernel: device peth0 entered promiscuous mode Oct 24 12:55:17 xendmz2 kernel: ------------[ cut here ]------------ Oct 24 12:55:17 xendmz2 kernel: WARNING: at net/core/dev.c:1176 br_add_if+0xf3/0x1cd [bridge]() Oct 24 12:55:17 xendmz2 kernel: Modules linked in: bridge stp fuse loop dm_mod dcdbas(X) rtc_cmos rtc_core iTCO_wdt rtc_lib serio_raw pcspk r ide_cd_mod iTCO_vendor_support joydev i5000_edac edac_core igb ses enclosure 8250_pnp 8250 serial_core shpchp pci_hotplug button sg usbhi d hid ff_memless uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif xenblk cdrom xennet edd reiserfs fan ide_pci_generic ata_generic ata_piix pata _acpi libata dock piix ide_core lpfc scsi_transport_fc scsi_tgt megaraid_sas scsi_mod thermal processor thermal_sys hwmon Oct 24 12:55:17 xendmz2 kernel: Supported: Yes, External Oct 24 12:55:17 xendmz2 kernel: Pid: 9811, comm: brctl Tainted: G 2.6.27.1-2-xen #1 Oct 24 12:55:17 xendmz2 kernel: Oct 24 12:55:17 xendmz2 kernel: Call Trace: Oct 24 12:55:17 xendmz2 kernel: [<ffffffff8020ba57>] show_trace_log_lvl+0x41/0x58 Oct 24 12:55:17 xendmz2 kernel: [<ffffffff8045cc08>] dump_stack+0x69/0x6f Oct 24 12:55:17 xendmz2 kernel: [<ffffffff8023136a>] warn_on_slowpath+0x51/0x77 Oct 24 12:55:17 xendmz2 kernel: [<ffffffffa02fb4d5>] br_add_if+0xf3/0x1cd [bridge] Oct 24 12:55:17 xendmz2 kernel: [<ffffffffa02fbc07>] add_del_if+0x48/0x65 [bridge] Oct 24 12:55:17 xendmz2 kernel: [<ffffffff803f078c>] dev_ioctl+0x400/0x4ab Oct 24 12:55:17 xendmz2 kernel: [<ffffffff803e1b09>] sock_ioctl+0x1ec/0x1f6 Oct 24 12:55:17 xendmz2 kernel: [<ffffffff802a6609>] vfs_ioctl+0x21/0x6c Oct 24 12:55:17 xendmz2 kernel: [<ffffffff802a6893>] do_vfs_ioctl+0x23f/0x255 Oct 24 12:55:18 xendmz2 kernel: [<ffffffff802a68fa>] sys_ioctl+0x51/0x73 Oct 24 12:55:18 xendmz2 kernel: [<ffffffff8020a878>] system_call_fastpath+0x16/0x1b Oct 24 12:55:18 xendmz2 kernel: [<00007f8474e1b4e7>] 0x7f8474e1b4e7 Oct 24 12:55:18 xendmz2 kernel: Oct 24 12:55:18 xendmz2 kernel: ---[ end trace 7a6c66e9ad895a74 ]---
Created attachment 247766 [details] more kernel trace
Created attachment 253302 [details] Patch for this bug. This patch may fix this bug. But how do you test it, Need I build a new kernel for you, or you can do it yourself? --thanks
Hi, thank you! Yes, i will test it on saturday. Please build the kernel for me! Oliver Mössinger
thanks, this patch for your comment#2 http://www.brsbox.com/filebox/filegroup/fgid/ef61bf4b743957d4007e08cdfc987e08 kernel-xen-base 6M kernel-xen 13M --thanks a lot Kong
Btw, the above link valid in 72hours.
Hi, it was difficult to read the site, but i have the files ;-) Thanks Oliver Mössinger
Re original comment: bnx2 stopping to work intermittently is a duplicate of bug 429739. The call trace, however, is X related - this is what we really may need to look at. Re #1: This is a duplicate of bug 435551. Re #3: Which of the call traces do you believe this patch addresses? I don't see it releated to either.
Hi Jan, the traces i reported are all generated on the same host! This traces are only generated with the XEN Kernel DOM0. DOMU not checked. The default Kernel is stable! So i believe, it must be a XEN Kernel bug, not a X bug! Oliver
I didn't say it's an X bug, I said it's an issue with X (rather than with one of the network cards). In any case, we'll need full hypervisor and kernel messages from that system, after making sure you run the latest bits.
Comment#2, the message only contain one call trace of WARNING as below; "Oct 24 13:03:50 xendmz2 kernel: WARNING: at net/core/dev.c:1516 skb_gso_segment+0x82/0x1a6()" This warning happened because the bridge doesn't deal with the relation between NETIF_F_TSO/GSO/SG and NETIF_F_GEN_CSUM. It seems Herbert Xu fixed it in this patch. So I need Oliver test this WARNING first.
Created attachment 256043 [details] process list at reboot
Created attachment 256044 [details] dmesg at reboot
Excuse me, but it was not possible to make the test on Saturday. Now the test is done! Here the Information i can give you: First i updated the host to the actual factory. Now i have this "openSUSE 11.1 Beta 5.2 (x86_64)" installation. With this kernel: xendmz2:~ # rpm -qa | grep kernel-xen kernel-xen-extra-2.6.27.7-3.1 kernel-xen-base-2.6.27.7-3.1 kernel-xen-2.6.27.7-3.1 Many of the kernel "Call Trace" are gone, super :-) There is only one still existent: Nov 27 08:10:01 xendmz2 BLKTAPCTRL[18655]: blktapctrl.c:797: Found driver: [ioemu disk] Nov 27 08:10:01 xendmz2 BLKTAPCTRL[18655]: blktapctrl.c:797: Found driver: [raw image (cdrom)] Nov 27 08:10:01 xendmz2 BLKTAPCTRL[18655]: blktapctrl_linux.c:23: /dev/xen/blktap0 device already exists Nov 27 08:10:02 xendmz2 kernel: vendor=8086 device=244e Nov 27 08:10:02 xendmz2 kernel: pci 0000:14:0d.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 Nov 27 08:10:02 xendmz2 kernel: ------------[ cut here ]------------ Nov 27 08:10:02 xendmz2 kernel: WARNING: at arch/x86/mm/pageattr-xen.c:622 __change_page_attr_set_clr+0xa4/0xb7c() Nov 27 08:10:02 xendmz2 kernel: CPA: called for zero pte. vaddr = ffff8800f0e39000 cpa->vaddr = ffff8800f0e39000 Nov 27 08:10:02 xendmz2 kernel: Modules linked in: netbk(N) blkbk(N) blktap(N) xenbus_be(N) dm_round_robin(N) dm_multipat h(N) scsi_dh(N) ip6t_REJECT(N) nf_conntrack_ipv6(N) ip6table_raw(N) xt_NOTRACK(N) ipt_REJECT(N) xt_physdev(N) xt_state(N) iptable_raw(N) iptable_filter(N) ip6table_mangle(N) nf_conntrack_netbios_ns(N) nf_conntrack_ipv4(N) nf_conntrack(N) ip_t ables(N) ip6table_filter(N) ip6_tables(N) x_tables(N) ipv6(N) microcode(N) bridge(N) stp(N) fuse(N) loop(N) dm_mod(N) bnx 2(N) 8250_pnp(N) 8250(N) rtc_cmos(N) iTCO_wdt(N) rtc_core(N) ide_cd_mod(N) serial_core(N) joydev(N) serio_raw(N) iTCO_ven dor_support(N) e1000e(N) button(N) shpchp(N) rtc_lib(N) pcspkr(N) i5000_edac(N) dcdbas(N) pci_hotplug(N) ses(N) edac_core (N) igb(N) enclosure(N) sg(N) usbhid(N) hid(N) ff_memless(N) uhci_hcd(N) ehci_hcd(N) usbcore(N) sd_mod(N) crc_t10dif(N) x enblk(N) cdrom(N) xennet(N) edd(N) reiserfs(N) fan(N) ide_pci_generic(N) ata_generic(N) ata_piix(N) pata_acpi(N) libata(N ) dock(N) piix(N) ide_core(N) lpfc(N) scsi_transport_fc(N) s Nov 27 08:10:02 xendmz2 kernel: csi_tgt(N) megaraid_sas(N) scsi_mod(N) thermal(N) processor(N) thermal_sys(N) hwmon(N) Nov 27 08:10:02 xendmz2 kernel: Supported: No Nov 27 08:10:02 xendmz2 kernel: Pid: 18651, comm: X Tainted: G 2.6.27.7-3-xen #1 Nov 27 08:10:02 xendmz2 kernel: Nov 27 08:10:02 xendmz2 kernel: Call Trace: Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8020c547>] show_trace_log_lvl+0x41/0x58 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff80461408>] dump_stack+0x69/0x6f Nov 27 08:10:02 xendmz2 kernel: [<ffffffff80232bf5>] warn_slowpath+0xa9/0xd1 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8021a186>] __change_page_attr_set_clr+0xa4/0xb7c Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8021ad2e>] change_page_attr_set_clr+0xd0/0x200 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff803e2656>] pci_mmap_page_range+0xe5/0x149 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff802eea9a>] mmap+0x5d/0x99 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8028b9b2>] mmap_region+0x2a1/0x4e8 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8028bee0>] do_mmap_pgoff+0x2e7/0x34b Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8020f3a0>] sys_mmap+0x8c/0xc4 Nov 27 08:10:02 xendmz2 kernel: [<ffffffff8020b368>] system_call_fastpath+0x16/0x1b Nov 27 08:10:02 xendmz2 kernel: [<00007fb3d34e5eea>] 0x7fb3d34e5eea Nov 27 08:10:02 xendmz2 kernel: Nov 27 08:10:02 xendmz2 kernel: ---[ end trace 344320c5fffdbd52 ]--- Nov 27 08:10:03 xendmz2 kernel: Not cloning cgroup for unused subsystem ns Nov 27 08:10:03 xendmz2 SuSEfirewall2: Setting up rules from /etc/sysconfig/SuSEfirewall2 ... All other Call Traces are missed! With and without your patch!!!! BUT the connection is LOST! Now more Information about this: The Dell host has two "bnx2" interfaces, eth0 and eth1. It was not possible to transport packages with eth0. eth1 is still working! See the attachment "dmesg.txt" and "psauxf.txt". At this moment i try to reboot. The process "brctl delbr eth0" hang with following message in dmesg: "unregister_netdevice: waiting for eth0 to become free. Usage count = 3" There was no firewall log for eth0 between "Nov 26 17:42:06" and "Nov 27 08:02:40" (08:02 reboot time, see "psauxf.txt"): Nov 26 17:18:09 xendmz2 kernel: SFW2-INint-DROP-DEFLT IN=eth0 OUT= PHYSIN=peth0 MAC=01:00:5e:00:00:fb:00:50:56:87:7c:6e:08:00 SRC=172.1 6.2.27 DST=224.0.0.251 LEN=64 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=44 Nov 26 17:42:06 xendmz2 kernel: SFW2-INext-DROP-DEFLT IN=eth1 OUT= PHYSIN=peth1 MAC=01:00:5e:00:00:fb:00:16:3e:7e:14:6b:08:00 SRC=192.1 68.254.27 DST=224.0.0.251 LEN=64 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=44 ... Nov 27 08:02:40 xendmz2 kernel: SFW2-INext-DROP-DEFLT IN=eth1 OUT= MAC= SRC=192.168.255.245 DST=224.0.0.251 LEN=551 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=531 Nov 27 08:02:40 xendmz2 kernel: SFW2-INint-DROP-DEFLT IN=eth0 OUT= MAC= SRC=172.16.4.245 DST=224.0.0.251 LEN=534 TOS=0x00 PREC=0x00 TTL =255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=514 See the missing PHYSIN in the last firewall logs! I hope this information help.
For this last remaining call trace I added a fix just half an hour ago, scheduled to go into whatever comes after RC1. As to the bnx2 problem - I'm not certain the Xen you've got has the necessary fix; you could in any case try disabling MSI either just for that driver or globally.
Thank you Jan, yes, there are some problems listed with bnx2 and MSI. I disabled MSI for bnx2. I will report what happen!
now the host and the network is stable! I tested with patch and without. In both configurations the network works. At the moment i need to disable MSI for bnx. Thank you
FIXED?
not really FIXED, a workaround is available (disable MSI for bnx2)!
It *is* fixed, the fix may just not be externally available, yet. Charles?
Is this the same issue as bug 429739? That fix is in RC1.
Yes, thanks. So Stephan/Oliver - the bnx2 issue *is* fixed. The GUI issue, however, will only be after the next Xen patch commit to the kernel cvs.
Kernel patches to fix the remaining issues here have been committed and will be available with a future kernel maintenance update.