|
Bugzilla – Full Text Bug Listing |
| Summary: | Xen hypervisor memory utilization does not reflect sum of domain allocations (leaks over time) | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.2 | Reporter: | Frank Fejes <ffejes> |
| Component: | Xen | Assignee: | Jan Beulich <jbeulich> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P3 - Medium | CC: | carnold, jdouglas, jfehlig, laurent.henry, vadim |
| Version: | Final | ||
| Target Milestone: | unspecified | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 11.2 | ||
| Whiteboard: | maint:released:11.2:35929 | ||
| Found By: | Community User | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Xen Info output for a server exhibiting the behavior
console output after xm debug-key q console output after xm debug-key H xm info after the two debug commands console output after xm debug-key m sample domain config for memory leak test tentative fix |
||
First of all, without having a reproduction scenario we're unlikely going to be able to do anything about it. We will have to rely on you to narrow the conditions under which these leaks(?) occur. Second, please provide the hypervisor's response to debug keys 'q' and 'H' (sent either from the serial console or through "xm debug-key <key>"). Third, please attach full hypervisor logs (ideally - according to above statement - annotated by when you start observing the bad behavior), obtained with "loglvl=all guest_loglvl=all", as well as "xm info" output. Finally, please clarify whether you're observing this behavior on just a single system, or multiple (distinct) ones. Thank you. To clarify, this is occurring on multiple distinct systems. The systems appear to be our "development" group of servers where VMs are more regularly started/stopped/created/deleted. The 'xm debug-key' command exits with a 0 status and no text output for either q or H. Regarding logs, it's tricky to annotate them with bad behavior information since I have not yet narrowed down any particular time or activity that causes the memory discrepancies to occur. For example, on the server for the logs I'll attach now, it has held pretty solid at around 61gb used for a couple weeks now, despite the fact that we've only had around 25-30gb worth of VM allocations running at any one time. Is there any sort of periodic command I could run that would generate information that would be useful in tracking this down? We haven't been running loglvl=all but I'll make those changes to our grub config now. Thanks again. Created attachment 349791 [details]
Xen Info output for a server exhibiting the behavior
(In reply to comment #2) > The 'xm debug-key' command exits with a 0 status and no text output for either > q or H. The output goes to the Xen log (i.e. you have to run "xm dmesg" afterwards). > Regarding logs, it's tricky to annotate them with bad behavior information > since I have not yet narrowed down any particular time or activity that causes > the memory discrepancies to occur. For example, on the server for the logs > I'll attach now, it has held pretty solid at around 61gb used for a couple > weeks now, despite the fact that we've only had around 25-30gb worth of VM > allocations running at any one time. Is there any sort of periodic command I > could run that would generate information that would be useful in tracking this > down? Precisely the actions described above (i.e. the two debug keys plus "xm info"). (In reply to comment #3) > Created an attachment (id=349791) [details] > Xen Info output for a server exhibiting the behavior This isn't useful alone - it ought to match up with log and debug key output. Attached are the q and H outputs along with a current xm info. Thanks again. Created attachment 349854 [details]
console output after xm debug-key q
Created attachment 349855 [details]
console output after xm debug-key H
Created attachment 349856 [details]
xm info after the two debug commands
Re #6: So there are (many) domains that failed to die completely. We're observing a similar problem currently on SLE11 SP1 (Xen 4.0), but this issue is still under investigation. But with that (and according to the provided output) you should be seeing this with more than half of your guests. What's possibly interesting is that while up to domain 100 no single domain got fully destroyed, subsequently there are no zombie domains anymore (all domains with IDs greater than 100 are still alive). Unless you have an explanation for this (e.g. all properly destroyed VMs having been of another type, e.g. PV), this might be a hint towards the root of the problem. What I also note is that shadow-memory guests most of the time leak much more pages than HAP ones (and - statistically meaningless - there's one PV zombie with yet fewer pages left). In Dom0, are you having more qemu-dm processes than there are guests? Re #7: Actually, 'm' would probably be better than 'H' in case we'll need it again. Would you be able to rebuild the hypervisor if we provided you with a debugging patch? That is very interesting. There should be no differences between the first 100 domains and those created afterward. On this particular server, each domU is an HVM built using the same libvirt xml base template file. Right now I see that the dom0 has 17 qemu-dm processes which corresponds with the number of running domUs. I'll attach 'm' output now. Regarding rebuilding and running a debugging hypervisor, I would certainly be able to do that if it would help. Finding the time for the eventual reboot could be tricky so I may not be able to give timely feedback any time soon. Thanks again. Created attachment 350294 [details]
console output after xm debug-key m
What block device protocol are you using for your guests? In particular, do you observe a difference in behavior between using tap:... and file: ones? (In reply to comment #12) > What block device protocol are you using for your guests? In particular, do you > observe a difference in behavior between using tap:... and file: ones? The vast majority of these are tap:aio, though we have a few file: guests. Right now we've begun creating all our guests with phy: devices in another environment (CentOS/Xen 3.4.2) and it would appear that we have not triggered this problem. Is there a thought that perhaps this would be a workaround? Thanks again. (In reply to comment #13) > The vast majority of these are tap:aio, though we have a few file: guests. > Right now we've begun creating all our guests with phy: devices in another > environment (CentOS/Xen 3.4.2) and it would appear that we have not triggered > this problem. Is there a thought that perhaps this would be a workaround? > Thanks again. Depends on what part of your reply you mean with "this". Using file: instead of any of the tap: variants likely is a workaround, which is why we asked the question in #12 (i.e. we had hoped you could confirm this theory). As per further analysis done on SLE11 SP1, using tap:tapdisk:aio: might also be a workaround (but requires one or more tools side fixes to deal with the tapdisk: part of the protocol specification). I apologize for the confusion. I was wondering (which you've now confirmed) if you felt that using file: or phy: could be a possible workaround for our original memory allocation problem. Thanks. my case with similar memory leaks: 0. openSUSE 11.2, xen 3.4 (distribution) or 4.0 (buildservice) 1. get Windows server 2008 R2 "10-days trial" DVD iso from http://download.microsoft.com/download/7/5/E/75EC4E54-5B02-42D6-8879-D8D3A25FBEF7/7600.16385.090713-1255_x64fre_server_eval_en-us-GRMSXEVAL_EN_DVD.iso 2. install windows using vm-install or manuall config (will attach sample config) with disk=[ 'tap:qcow2:/path/to/disk/image ....' ] 3. xm info > xm-info-before 4. xm start windows-domu-name 5. shutdown windows normally 6. xm info > xm-info-after 7. diff -u xm-info-before xm-info-after: ... -free_memory : 9043 +free_memory : 8434 free_cpus : 0 -max_free_memory : 9043 -max_para_memory : 9039 -max_hvm_memory : 9009 +max_free_memory : 8434 +max_para_memory : 8430 +max_hvm_memory : 8402 ... leak does not appear with file: instead of tap:qcow2 Created attachment 358189 [details]
sample domain config for memory leak test
(In reply to comment #9) > Would you be able to rebuild the hypervisor if we provided you with a debugging > patch? patch is welcome We actually appear to have a (kernel side) fix for this meanwhile, so if you would test this instead - I'll attach the patch in a second. Created attachment 358245 [details]
tentative fix
(In reply to comment #21) > We actually appear to have a (kernel side) fix for this meanwhile, so if you > would test this instead - I'll attach the patch in a second. patch helped no more difference in free_memory before domu start and after domu shutdown for tap:qcow2 and tap:aio Shouldn't "severity" be changed to "critical" ? Because it's IMHO definitely "severe memory leak" (~600Mb each run in my case) and could be critical when start/stop is regular (as in reporter's situation). Sorry if I'm missing something. looks like fix has a side effect (dom0 hangs when swap used after domu stop) 0. openSUSE 11.2, xen 4.0 (buildservice) 1. kernel /boot/xen.gz vtd=1 iommu=pv dom0_mem=2048M dom0_max_vcpus=2 dom0_vcpus_pin 2. dom0 with 32Gb swap partition and kernel with tentative fix 3. PV domU with some phy:/dev/.. disks (no blktap disks were used) 4. perl -n /dev/zero, wait until ~8Gb of swap will be used, then press Ctrl-C 5. xm shutdown <domU> 6. perl -n /dev/zero - dom0 hangs in a few seconds no problems with 2.6.31.12-0.2-xen kernel without tentative fix, got expected "out of memory" ideas/instructions/patches are welcome How can a fix to code that doesn't get used in your test case be responsible for the bad outcome of the test? Please let's keep separate problems separate (i.e. open new bugs for unrelated problems). oops seems that this problem is really not related to #589788, sorry (reproduced with unmodified 2.6.31.12-0.2-xen dom0 kernel) Fix was just committed, will be available in a future kernel maintenance update. *** Bug 613490 has been marked as a duplicate of this bug. *** Did the situation went better (even juste for a while) after a system reboot ? Update released for: kernel-debug, kernel-debug-base, kernel-debug-base-debuginfo, kernel-debug-debuginfo, kernel-debug-debugsource, kernel-debug-devel, kernel-debug-devel-debuginfo, kernel-default, kernel-default-base, kernel-default-base-debuginfo, kernel-default-debuginfo, kernel-default-debugsource, kernel-default-devel, kernel-default-devel-debuginfo, kernel-desktop, kernel-desktop-base, kernel-desktop-base-debuginfo, kernel-desktop-debuginfo, kernel-desktop-debugsource, kernel-desktop-devel, kernel-desktop-devel-debuginfo, kernel-pae, kernel-pae-base, kernel-pae-base-debuginfo, kernel-pae-debuginfo, kernel-pae-debugsource, kernel-pae-devel, kernel-pae-devel-debuginfo, kernel-source, kernel-source-vanilla, kernel-syms, kernel-trace, kernel-trace-base, kernel-trace-base-debuginfo, kernel-trace-debuginfo, kernel-trace-debugsource, kernel-trace-devel, kernel-trace-devel-debuginfo, kernel-vanilla, kernel-vanilla-base, kernel-vanilla-base-debuginfo, kernel-vanilla-debuginfo, kernel-vanilla-debugsource, kernel-vanilla-devel, kernel-vanilla-devel-debuginfo, kernel-xen, kernel-xen-base, kernel-xen-base-debuginfo, kernel-xen-debuginfo, kernel-xen-debugsource, kernel-xen-devel, kernel-xen-devel-debuginfo, preload-kmp-default, preload-kmp-desktop Products: openSUSE 11.2 (debug, i586, x86_64) |
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2) Gecko/20100218 Fedora/3.6.1-2.fc14 Firefox/3.6 Over the past couple months I've noticed that a few of our Xen servers are reporting (via xentop) hypervisor memory utilization discrepancies between the sum of VM memory allocations and the actual memory used. In time, the free memory drops to a point where no new VMs can be started. Below is recent sample xentop output with extra columns chopped. The hypervisor reports that it is using 32gb of ram, however the sum of the running VMs is 16gb. Ballooning is disabled and dom0_mem is set to 4gb. xentop - 15:38:20 Xen 3.4.1_19718_04-2.1 12 domains: 2 running, 10 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 33549248k total, 33241552k used, 307696k free CPUs: 4 @ 2992MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) xxxxxxxxxx --b--- 1584 0.2 547840 1.6 547840 1.6 Domain-0 -----r 165894 34.3 4192768 12.5 no limit n/a xxxxxxxxxx --b--- 15166 22.3 3071916 9.2 3076096 9.2 xxxxxxxxxx -----r 1328 77.8 1028012 3.1 1028096 3.1 xxxxxxxxxx --b--- 55433 4.7 1027880 3.1 1028096 3.1 xxxxxxxxxx --b--- 67527 4.8 1027880 3.1 1028096 3.1 xxxxxxxxxx --b--- 73019 6.8 1027880 3.1 1028096 3.1 xxxxxxxxxx --b--- 61266 6.8 1027880 3.1 1028096 3.1 xxxxxxxxxx --b--- 78880 6.2 1028012 3.1 1028096 3.1 xxxxxxxxxx --b--- 969 7.7 516012 1.5 516096 1.5 xxxxxxxxxx --b--- 10544 1.5 1027880 3.1 1028096 3.1 xxxxxxxxxx --b--- 21001 5.8 1027880 3.1 1028096 3.1 This behavior is only seen on our "development" Xen servers where we routinely create/clone/destroy a large number of (mostly HVM) VMs, however when doing any of those functions I cannot reproduce the "leak". That is to say, when I create a VM the hypervisor memory free statistic drops in an amount corresponding to the size of the VM. When the VM is destroyed, the free statistic increases back to where it was previously. I have not been able to find any way to get the hypervisor to reclaim some of the lost memory and, as a result, I have been forced to reboot the server. On systems with 20+ VMs where we don't have the option for live migration this can be rather traumatic. Thank you! Reproducible: Couldn't Reproduce Steps to Reproduce: 1. 2. 3.