|
Bugzilla – Full Text Bug Listing |
| Summary: | Major issues with xen VMs since upgrade from 11.2 to 11.3 (network lost, tools unrealiable) | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.3 | Reporter: | Forgotten User CxVz4LpaB5 <forgotten_CxVz4LpaB5> |
| Component: | GNOME | Assignee: | E-mail List <gnome-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | carnold, forgotten_ToNzP6iHCC, jdouglas, jfehlig, pjdick, rupert.kolb |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Xen various logs and configs
virt-manager segfault |
||
Preston, Could you set up an 11.2 machine and create a few VMs and then upgrade the host to 11.3. Hi, Note that I was able to make my VMs up and running. I have not yet determined if the issue is on the bridging stuff or the use of the 'hypervisor' defaut nic. Seems that with the realtek nic all is working. I will test further to help diagnose the issue. Still, command line tools to start, create the VMs have an issue: - Never come back to the prompt (have to do a ctrl-c) - Always have to unpause the vm to have it started - virt-manager seems to have issues also (crash at some point when starting a vm, etc) Thanks ok, the virt-manager issue is a segmentation fault (see attachment) For information: edvac:~ # rpm -aq | grep virt libvirt-0.8.1-3.7.x86_64 libvirt-python-0.8.1-3.7.x86_64 virt-manager-0.8.4-4.3.x86_64 virt-utils-1.1.2-1.7.x86_64 virt-viewer-0.2.1-1.8.x86_64 libvirt-client-0.8.1-3.7.x86_64 Created attachment 377013 [details]
virt-manager segfault
Found this if it can help: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=511023 https://bugzilla.redhat.com/show_bug.cgi?id=471072 Romain And this also : https://bugzilla.redhat.com/show_bug.cgi?id=459927 edvac:/var/log/YaST2 # rpm -aq | grep gtk gtk2-2.20.1-2.13.x86_64 libgtk-vnc-1_0-0-0.3.10-3.1.x86_64 gtk2-branding-openSUSE-11.3-3.1.noarch python-gtk-2.17.0-4.1.x86_64 gtk2-lang-2.20.1-2.13.noarch gtk2-metatheme-gilouche-11.1.2-5.1.noarch gtk2-engines-lang-2.20.1-1.6.noarch gtk2-32bit-2.20.1-2.13.x86_64 gtk2-engines-32bit-2.20.1-1.6.x86_64 gtk2-engines-2.20.1-1.6.x86_64 gtk2-metatheme-sonar-11.3.0-2.3.noarch gtk2-engine-murrine-32bit-0.90.3-8.1.x86_64 gtk2-engine-murrine-0.90.3-8.1.x86_64 python-gtk-vnc-0.3.10-3.1.x86_64 You will likely need updated kernel packages from 'Factory' to fix a netback crash which effects networking in PV guests and HVM guests using PV drivers. Please make sure you are not running one of the RC's or milestones (cat /etc/issue). There was a GTK2 bug that caused a crash in most of the tools (virt-manager and vm-install). I did an upgrade from 11.2 to 11.3 using the online procedure: http://en.opensuse.org/SDB:System_upgrade So I must be up to date with factory repo. I have check if online updates are available until now but found nothing. I guess something went wrong during upgrade... I tried reproducing the issues you are seeing in our lab. From what I can tell the ubuntu vms that I created seemed to lose a NIC after upgrading but they had 2 NICs but default after creating them using virt-manager in 11.2. I did not see any vms starting paused but on some it would appear that virt-manager is crashing after I start a vm. I also used some OpenSuse 11.2 guests and they seemed to have retained their NIC configuration through the upgrade. What type of Ubuntu guests were you running? (In reply to comment #9) > I tried reproducing the issues you are seeing in our lab. From what I can tell > the ubuntu vms that I created seemed to lose a NIC after upgrading but they had > 2 NICs but default after creating them using virt-manager in 11.2. I did not > see any vms starting paused but on some it would appear that virt-manager is > crashing after I start a vm. I also used some OpenSuse 11.2 guests and they > seemed to have retained their NIC configuration through the upgrade. What type > of Ubuntu guests were you running? Hi, In fact now the issue is less on the VM side since I was able to have them running even if I have to reconfigure their network. The major issues are: - virt-manager crash so I can't really configure the VMs this way - xen tools does not seems to work as expected Still: - Possible issue on the way bridged network is managed - Possible issue with hypervisor default nic Do you want me to open different tickets for those troubles? Thanks I have opened another ticket for an issue with yast and network config and I think it is related to my issues: I found that the system seems to have trouble with xen networking: the '/sbin/udevadm info -e' command seems to always hang at this point: ... P: /devices/xen-backend/vif-1-0 see bug#623285 (In reply to comment #11) > the '/sbin/udevadm info -e' command seems to always hang at this point: > > ... > P: /devices/xen-backend/vif-1-0 Seems like more fallout from netback deadlock issue in 11.3 final. Can you try the 11.3 KOTD, which contains a fix for the deadlock? Forgot to include a link to 11.3 KOTD ... ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/ I downloaded the KOTD and it looks like it fixes the instabilities that I was seeing in virt-manager on our test machine in our lab. (In reply to comment #14) > I downloaded the KOTD and it looks like it fixes the instabilities that I was > seeing in virt-manager on our test machine in our lab. Can you give me the procedure you have followed to download and use the KOTD stuff please? Thanks (In reply to comment #15) > Can you give me the procedure you have followed to download and use the KOTD > stuff please? Replace your kernel-xen* packages with the ones from 11.3 KOTD repo. - rpm -qa | grep kernel-xen - download replacement kernel-xen* packages from ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/ - rpm -Uvh kernel-xen*.rpm - reboot Kernel-xen-2.6.34-12.3.x86_64 Linux edvac 2.6.34-12-xen #1 SMP 2010-06-29 02:39:08 +0200 x86_64 x86_64 x86_64 GNU/Linux edvac:~ # rpm -Uvh kernel-xen-2.6.34.1-0.0.14.85ddc1a.x86_64.rpm Préparation... ########################################### [100%] 1:kernel-xen ########################################### [100%] Kernel image: /boot/vmlinuz-2.6.34.1-0.0.14.85ddc1a-xen Initrd image: /boot/initrd-2.6.34.1-0.0.14.85ddc1a-xen KMS drivers: intel-agp i915 Root device: /dev/VG0/LV0_ROOT (mounted on / as ext4) Resume device: /dev/VG0/LV0_SWAP modprobe: Module piix not found. WARNING: no dependencies for kernel module 'piix' found. Kernel Modules: hwmon thermal_sys thermal scsi_mod libata sata_sil pata_acpi processor fan xennet cdrom xenblk dm-mod dm-snapshot crc16 jbd2 ext4 agpgart intel-agp output video i2c-core button i2c-algo-bit drm drm_kms_helper i915 sata_uli pata_serverworks pata_jmicron pata_sil680 pata_sis ata_piix pata_artop pata_triflex sata_vsc pdc_adma pcmcia_core pcmcia pata_pcmcia pata_cs5530 pata_cs5520 pata_opti pata_hpt3x2n sata_sil24 pata_optidma pata_pdc2027x pata_netcell sata_via pata_cypress pata_ns87410 pata_it8213 pata_cmd64x pata_atp867x pata_radisys pata_oldpiix pata_sl82c105 sata_mv sata_sis pata_via pata_ns87415 sata_svw pata_hpt366 pata_efar pata_sc1200 pata_ali pata_hpt37x pata_rdc ahci pata_cmd640 pata_sch pata_pdc202xx_old pata_hpt3x3 pata_atiixp ata_generic sata_promise sata_sx4 pata_mpiix pata_amd pata_ninja32 pata_piccolo sata_qstor sata_nv pata_rz1000 sata_inic162x pata_marvell pata_it821x sd_mod usbcore ohci-hcd ehci-hcd uhci-hcd hid usbhid linear Features: dm kms block usb lvm2 resume.userspace resume.kernel Bootsplash: openSUSE (1024x768) 45817 blocs Linux edvac 2.6.34.1-0.0.14.85ddc1a-xen #1 SMP 2010-07-21 12:56:48 +0200 x86_64 x86_64 x86_64 GNU/Linux After the reboot yast/network works so one issue solve. Virt-manager have worked only one time. After closing and reopening the application, I have a segfault each time I open a console of a VM (seems related with vnc in some way) edvac:/var/crash # gdb python core.5222 ... Loaded symbols for /lib64/libbz2.so.1 Core was generated by `python /usr/share/virt-manager/virt-manager.py'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007f8d515bc33e in gvnc_server_message () from /usr/lib64/libgtk-vnc-1.0.so.0 #2 0x00007f8d515c1914 in ?? () from /usr/lib64/libgtk-vnc-1.0.so.0 #3 0x00007f8d515c512b in ?? () from /usr/lib64/libgtk-vnc-1.0.so.0 #4 0x00007f8d5fa6bef0 in ?? () from /lib64/libc.so.6 #5 0x0000000001c6c980 in ?? () #6 0x0000000000000000 in ?? () (gdb) where #0 0x0000000000000000 in ?? () #1 0x00007f8d515bc33e in gvnc_server_message () from /usr/lib64/libgtk-vnc-1.0.so.0 #2 0x00007f8d515c1914 in ?? () from /usr/lib64/libgtk-vnc-1.0.so.0 #3 0x00007f8d515c512b in ?? () from /usr/lib64/libgtk-vnc-1.0.so.0 #4 0x00007f8d5fa6bef0 in ?? () from /lib64/libc.so.6 #5 0x0000000001c6c980 in ?? () #6 0x0000000000000000 in ?? () I suspect GNOME is the appropriate component for libgtk-vnc ... The summary of this bug and majority of comments are quite misleading given the problem Romain is now seeing in #18. I should have closed it and requested Romain to enter a new bug before reassigning, but I'll now leave this decision with the new owners :). Sounds good to me. My only worry about this idea is that I can't really tell that network issues with xen are solved since I can't play with virt-manager... If it's possible I could rename this bug report with 'Network issue in xen with opensuse 11.3', let it pending until the bug ' 'virt-manager segfault in opensuse 11.3' is solved. As soon as the virt-manager vnc issue could be solved then I could try to play with it and try to test if the original settings I have with opensuse 11.2 work with 11.3. What do you think? What I can confirm since I have upgrade my opensuse 11.3 with kotd kernel: - xm create <vm> starts the vm in pause mode : fixed - xm <command> never return to the shell : fixed - Issue with setting network interface card on vm (e1000) : fixed So for the most part, issues are fixed. I juste wondering if an update to the xen kernel that will came with a normal update will overwrite my kotd kernel. So let me know if the bug will be closed and another one should be opened for the virt-manager/segfault issue. thanks Any news on your side? Please give me some news so I can know if I have to create another ticket or not. Thanks (In reply to comment #20) > The summary of this bug and majority of comments are quite misleading given the > problem Romain is now seeing in #18. I should have closed it and requested > Romain to enter a new bug before reassigning, but I'll now leave this decision > with the new owners :). Why would this ticket be closed? The bug has not been fixed through a patch. Someone found an arcane workaround, but that hardly seems to resolve the issue for all users of 11.3. I am pretty new to opensuse, having started with 11.1. I do have to say that I find it strange that 11.1 was the last release where xen actually worked. 11.2 and 11.3 were both released with xen not working properly. I wonder if Novell doesn't want people to get this feature for free? (In reply to comment #24) > I am pretty new to opensuse, having started with 11.1. I do have to say that I > find it strange that 11.1 was the last release where xen actually worked. 11.1 is the same code base as SLES11 - and is well tested since it is the enterprise product. > 11.2 > and 11.3 were both released with xen not working properly. I wonder if Novell > doesn't want people to get this feature for free? But fixes have been made available. There are a lot of clever folks using openSUSE Xen in all sorts of interesting deployments. openSUSE is a community project and as such the community is expected to help harden it. Testing and bug reporting *prior* to releases is much appreciated :-). Thanks! (In reply to comment #25) > (In reply to comment #24) > > I am pretty new to opensuse, having started with 11.1. I do have to say that I > > find it strange that 11.1 was the last release where xen actually worked. > 11.1 is the same code base as SLES11 - and is well tested since it is the > enterprise product. > > 11.2 > > and 11.3 were both released with xen not working properly. I wonder if Novell > > doesn't want people to get this feature for free? > But fixes have been made available. There are a lot of clever folks using > openSUSE Xen in all sorts of interesting deployments. > openSUSE is a community project and as such the community is expected to help > harden it. Testing and bug reporting *prior* to releases is much appreciated > :-). Thanks! Well, I stand corrected. Go ahead and close the ticket even though it has not been resolved. I will move on to a differnet flavor that provides release canidates that work, or at least provides patches before closing an open bug. My bad. (In reply to comment #26) > Well, I stand corrected. Go ahead and close the ticket even though it has not > been resolved. I don't see that anyone internal is suggesting to close this bug. It has been reassigned to appropriate component for fixing. (In reply to comment #27) > (In reply to comment #26) > > Well, I stand corrected. Go ahead and close the ticket even though it has not > > been resolved. > > I don't see that anyone internal is suggesting to close this bug. It has been > reassigned to appropriate component for fixing. So, I understand the point about this being a community project, and that you rely on the community to fix some of the bugs. I also realize that I'm not a huge contributor in terms of code (and you would probably thank me for that if you knew my low level of programming skills :-). That said, this last comment was over two weeks ago, openSuSE 11.3 has been out for five or six weeks, and the fix for this issue has been known/available in the KOTD and OBS Kernel repositories for three to four weeks. So, the question is: why hasn't this shown up in the form of a kernel update in the openSuSE 11.3 repositories?? I'm not trying to be impatient - I'm happy to use the OBS Kernel repo or KOTD, and I very much appreciate the folks who can and do donate their time and energy to developing openSuSE, but this is a pretty major issue in a release of a Linux distribution that is considered "stable." I understand your point about the community development of this project, but if you expect folks to use openSuSE as a distribution (and, consequently, move into SLES for Enterprise-level functionality), these issues need to be responded to quickly, and four weeks for a problem of this magnitude is anything but quick. Issues like this will reflect on the entire "SUSE" product line, and that includes the Enterprise products. (In reply to comment #28) > why hasn't this shown up in the form of a kernel update in the openSuSE > 11.3 repositories?? I see that a kernel update fore 11.3 has been started but it has not yet made it to QA. > these issues need to be responded to quickly, > and four weeks for a problem of this magnitude is anything but quick. But it was responded to quickly. The kernel issue was fixed as soon as it was identified - and the fix made available to users. Granted, it wasn't through official update channel, but that's the type of support you get with the enterprise products. > Issues like this will reflect on the entire "SUSE" product line, and that includes the > Enterprise products. While I understand your frustration, it's unfortunate that the support offered for a free product reflects on the support of a paid product. (In reply to comment #29) > > I see that a kernel update fore 11.3 has been started but it has not yet made > it to QA. > > > these issues need to be responded to quickly, > > and four weeks for a problem of this magnitude is anything but quick. > > But it was responded to quickly. The kernel issue was fixed as soon as it was > identified - and the fix made available to users. Granted, it wasn't through > official update channel, but that's the type of support you get with the > enterprise products. Yes, but it took me two days of banging my head against a wall (that my company paid for, no less) before I decided that maybe I wasn't actually doing something wrong and it may be a bug. Then I posted to the Xen mailing list and someone was kind enough to respond and point me in the direction of this bug. Had the fix that was already out there actually made it to the official update channel, I wouldn't have encountered it at all. I'm not trying to be difficult - I do understand that these things take time, this one just seemed to take a lot longer than usual. That's why it's frustrating and surprising, because it doesn't usually take this long, so, when I run into issues, I'm not in the habit of assuming it's a bug - I usually blame myself! :-) > > > Issues like this will reflect on the entire "SUSE" product line, and that includes the > > Enterprise products. > > While I understand your frustration, it's unfortunate that the support offered > for a free product reflects on the support of a paid product. Well, unfortunately, that's reality. It isn't so much for folks like me - I do understand that there are differences, that Novell is going to concentrate more effort on the product that brings in revenue, etc. But, if you're trying to build a strong customer and user base for either free or paid products (it's the gateway, right - start with the free, move to the enterprise), it's going to be hard to do when folks stumble across issues like this and don't see fixes appear very quickly. I'll bet that there were some number of folks out there - maybe not a ton of them, but certainly a few - that were trying openSuSE for the first time. They downloaded and installed openSuSE 11.3, started playing with it, and then ran into one of the issues caused by this bug. Perhaps they played with it for a couple of weeks, tried updating, etc., then dropped it and went and found another distribution that worked, or went back to one they had been using. That's not just losing a user of the free product, that's losing a potential customer of the enterprise product. Like it or not, that's how it works. I'm still a SuSE user (both free and paid versions) and have no intention on switching. I think it's a great distribution, and I appreciate all the hard work that goes into it! I'm just trying to point out the negative impact something like this can have on the product perception, and how frustrating it is for those of us who are not accustomed to these sorts of delays :-). Anyway, I think I've hijacked this bug thread enough. Getting back to the bug - it's still listed under the GNOME component - is this really correct, or perhaps it should say kernel, instead?? -Nick (In reply to comment #30) > Getting back to the bug > - it's still listed under the GNOME component - is this really correct, or > perhaps it should say kernel, instead?? As I mentioned in #20, this bug is has become very confusing. There were all sorts of symptoms caused by the netback deadlock, which have been fixed. The remaining issue is described in #18. It's a problem in libgtk-vnc, hence reassignment to GNOME component. In hindsight, I should have asked reporter to open a new bug for the libgtk-vnc issue and closed this bug as a duplicate of bnc#618678 - which is where we initially discovered the netback deadlock. Romain, Are you still seeing the issue in #18? If so, could you open a new bug against GNOME? This bug is officially too overloaded :-). Thanks! (In reply to comment #31) > (In reply to comment #30) > > Getting back to the bug > > - it's still listed under the GNOME component - is this really correct, or > > perhaps it should say kernel, instead?? > > As I mentioned in #20, this bug is has become very confusing. There were all > sorts of symptoms caused by the netback deadlock, which have been fixed. The > remaining issue is described in #18. It's a problem in libgtk-vnc, hence > reassignment to GNOME component. In hindsight, I should have asked reporter to > open a new bug for the libgtk-vnc issue and closed this bug as a duplicate of > bnc#618678 - which is where we initially discovered the netback deadlock. > > Romain, Are you still seeing the issue in #18? If so, could you open a new bug > against GNOME? This bug is officially too overloaded :-). Thanks! Yea the issue is still there but I think that this particular issue could be closed and another one should be opened for GNOME team with this issue. Juste let me post tomorrow some additional infos about this and create another one before closing this bug report. Thanks As I understand, there is no way out of the box to use a more recent openSUSE than a xen dom0 in combination with paravirtualized domUs. The last working version is 11.1. Am I right? This means, we have to be stuck at 11.1 when using xen, or use an other distribution: RH, Ubuntu, ... ??? 11.2 or 11.3 are useless? Rupert (In reply to comment #33) > As I understand, there is no way out of the box to use a more recent openSUSE > than a xen dom0 in combination with paravirtualized domUs. The last working > version is 11.1. 11.3 works fine. It shipped with a bug in netback but a kernel update fixing the issue has been released. We are also in the process of QA'ing updated xen packages for 11.3. You can test them in your environment as well http://download.opensuse.org/repositories/Virtualization:/openSUSE11.3/openSUSE_11.3/x86_64/ > Am I right? > This means, we have to be stuck at 11.1 when using xen, or use an other > distribution: RH, Ubuntu, ... ??? 11.2 or 11.3 are useless? No, that's not right. 11.3 works fine with updated kernel-xen in dom0. Our planned xen update for 11.3 will fix additional bugs found since 11.3 shipped (and SLES11 SP1 since they have same xen versions), further improving it's stability. BTW, I am going to close this bug now since it has become so convoluted. Please, if anyone on this bug is having further issues with an *updated* 11.3 system, open a new bug report describing the issue and we will gladly take a look. This bug has been abused enough with all sorts of unrelated issues. |
Created attachment 376631 [details] Xen various logs and configs User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6 I have upgraded from 11.2 to 11.3 using zypper dup. I have noticed that all my VMs after rebooting have lost their network My network config is: eth0 <- vlan2 <- br2 I host some ubuntu vm and a opensuse 11,2 vm - So first thing that I have noticed is that all the vms have lost their network card. - I start my vm using xm create <vm> and the prompt never comme back. I have to do a ctrl+c to get back to the prompt and do a xm unpause <vm> to get it started - Every vm start paused, don't know why. - to restore the network card of the vm I have played with: * virsh domxml-to-native xen-xm <vm>.xml > <vm> * xm delete <vm> * xm new <vm> After that I was able to configure network on the ubuntu vms - Strangly if I apply the same cookbook to opensuse vms, they crash at some point and I really don't know why - I have create a brand new vm (opensuse 11.3) and since it works better, virt-manager crash every time I try to open the console - xm console never bring back the console All opensuse 11.2 rpms seems to have been updated to opensuse 11.3 for xen All vms configured to use the hypervisor default nic seems problematic, things seems goes better with the realtek nic I really need some help. Reproducible: Always Steps to Reproduce: 1. update from 11.2 to 11.3 with xen installed and some vm created 2. try to use the tools (xm create, etc) 3. try configuring vms network