Bugzilla – Bug 559047
XEN system hangs with high I/O load
Last modified: 2009-12-16 08:08:38 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.0.15) Gecko/2009102100 SUSE/3.0.15-0.1.2 Firefox/3.0.15 On two different systems I experienced system hangs when high I/O load occurs with at least one domU involved. Here I describe the situation on a HP Proliant with 16GB RAM. openSUSE 11.2 x86_64 dom0 with one (PV) openSUSE 11.2 x86_64 domU. Network configuration uses a bridge br0 with the dom0 and the domU network interfaces added to the bridge. When for example, copying a large file on the dom0 with scp from the domU to the dom0, the system dies. A log captured via a serial port is attached. I will repeat the experiment on the other machine (Super Micro based PC) and provide that log file as well. Reproducible: Always Steps to Reproduce: 1. Boot system and login as root 2. xm start domu 3. scp domu:bigfile . Actual Results: System is not responding anymore Expected Results: A stable system
Created attachment 329906 [details] system log captured via serial port
Created attachment 329913 [details] system log captured via serial port. A second experiment on the same machine (now including dom0 log). The file which is copied exist on the file system of the domU and is around 1GB in size. This time the system did survive a first scp, but died after the second try.
Created attachment 329916 [details] system log captured via serial port This is on a different system with 4GB RAM. Same kind of Linux configuration, same experiment and finally the same result: the system hangs.
Created attachment 330154 [details] system log captured via serial port. This is from another try on the supermicro system. Apparently different type of problems are detected. This one says: [ 2038.844651] BUG: unable to handle kernel paging request at ffff880300000004 [ 2038.844663] IP: [<ffffffff80244fab>] memcpy_c+0xb/0x20 [ 2038.844674] PGD 1878067 PUD 0 [ 2038.844677] Thread overran stack, or stack corrupted [ 2038.844680] Oops: 0002 [#1] SMP On the console of dom0 I was executing scp domu:/bigfile . and in the domu is was performing a local copy. This log includes a XEN dump at the end.
The last two logs clearly are matching the report in bug 553690. Thus, knowing whether with "mem=4G" on the Xen command line the issue also happens would be useful to know. If you feel comfortable with re-building the kernel, trying the debugging patch in that bug would also be very useful. As to the earlier two logs - there appear to be I/O problems earlier, and hence it would seem within the realm of possibility that the crashes are just follow-up problems. We may need to get someone familiar with the driver involved if this turns out to be an issue different from the other one. Trying "mem=4G" like above would certainly also be a useful thing to try. For both cases, I'd also like to understand whether the presence of any DomU is a required prerequisite for the bug to occur.
1. Providing Xen with the option "mem=4G" did seem to solve the issue. I did a lot of stress testing and the Supermicro system didn't complain. So this is the first part of the info. 2. I will try if I can apply the patch and rebuild the kernel and will report on this later. 3. I will report the results of running Xen with "mem=4G" on the Proliant later. 4. I will test Xen without DomU's to see if the bug occurs and will report on this later.
1. & 3. Response on "knowing whether with "mem=4G" on the Xen command line the issue also happens would be useful to know" Both systems are stable with the mem=4G option. The HP Proliant seems to have an issue with the cciss driver. In all situations (Xen, Non-Xen) the performance of the driver is horrible, but that is a different problem. What may be interesting however is that the error from the cciss driver (end_request: I/O error, dev cciss/c0d0, sector 0) which is reported by dom0 only occurs from the moment a domU starts running. 2. Response on "If you feel comfortable with re-building the kernel, trying the debugging patch in that bug would also be very useful" I did not try to rebuild the kernel with the available patch from <a href="show_bug.cgi?id=553690" title="NEEDINFO - Xenified kernel crashes during F12 PV DomU's install packages deployment phase">bug 553690</a>, yet. I will try this and report if I manage. 4. Response on "For both cases, I'd also like to understand whether the presence of any DomU is a required prerequisite for the bug to occur." Running only the dom0 with a lot of I/O seems to be fine. Even in the situation where, one or more domU's are running (doing almost nothing) with a lot of I/O on dom0, the bug does not manifest itself. This is true irrespective of limiting dom0 memory with the dom0_mem parameter.
Created attachment 331234 [details] system log including chrash kernel patched I patched the kernel with the mentioned patch V3. Here are the results. The first Oops occured when I Ctrl-C-ed a scp on dom0. The last happened during scp-ing from dom0 to domU + repeatedly untarring a big tar on domU.
How are you VMs' disks being backed?
Also, we'll need either the vmlinux binary or at least the disassembly of unmap_single().
(In reply to comment #9) > How are you VMs' disks being backed? Disks are defined via LVM disk=[ 'phy:/dev/sys/staging,xvda,w', ]
(In reply to comment #10) > Also, we'll need either the vmlinux binary or at least the disassembly of > unmap_single(). Can you give me a pointer on how to obtain the disassembly of unmap_single()? vmlinux can be downloaded from: http://ftp.concero.nl/pub/kernel/vmlinux
(In reply to comment #12) > (In reply to comment #10) > > Also, we'll need either the vmlinux binary or at least the disassembly of > > unmap_single(). > > Can you give me a pointer on how to obtain the disassembly of unmap_single()? > > found it: Reading symbols from /usr/src/linux-2.6.31.5-0.1/vmlinux...done. (gdb) disassemble unmap_single Dump of assembler code for function unmap_single: 0xffffffff802446b0 <unmap_single+0>: push %rbp 0xffffffff802446b1 <unmap_single+1>: mov %rsp,%rbp 0xffffffff802446b4 <unmap_single+4>: push %r14 0xffffffff802446b6 <unmap_single+6>: mov %ecx,%r14d 0xffffffff802446b9 <unmap_single+9>: push %r13 0xffffffff802446bb <unmap_single+11>: mov %rdx,%r13 0xffffffff802446be <unmap_single+14>: push %r12 0xffffffff802446c0 <unmap_single+16>: push %rbx 0xffffffff802446c1 <unmap_single+17>: mov %rsi,%rbx 0xffffffff802446c4 <unmap_single+20>: sub $0x10,%rsp 0xffffffff802446c8 <unmap_single+24>: mov %gs:0x28,%rax 0xffffffff802446d1 <unmap_single+33>: mov %rax,-0x28(%rbp) 0xffffffff802446d5 <unmap_single+37>: xor %eax,%eax 0xffffffff802446d7 <unmap_single+39>: callq 0xffffffff80244600 <swiotlb_bus_to_virt> 0xffffffff802446dc <unmap_single+44>: cmp $0x3,%r14d 0xffffffff802446e0 <unmap_single+48>: je 0xffffffff802448cb <unmap_single+539> 0xffffffff802446e6 <is_swiotlb_buffer+0>: mov 0x62b3e3(%rip),%rsi # 0xffffffff8086fad0 0xffffffff802446ed <is_swiotlb_buffer+7>: cmp %rsi,%rax 0xffffffff802446f0 <is_swiotlb_buffer+10>: jb 0xffffffff802446fb <machine_to_phys> 0xffffffff802446f2 <is_swiotlb_buffer+12>: cmp 0x62b3df(%rip),%rax # 0xffffffff8086fad8 0xffffffff802446f9 <is_swiotlb_buffer+19>: jb 0xffffffff80244770 <do_unmap_single> 0xffffffff802446fb <machine_to_phys+0>: mov %rbx,%rax 0xffffffff802446fe <machine_to_phys+3>: shr $0xc,%rax 0xffffffff80244702 <mfn_to_pfn+0>: cmpb $0x0,0x512cf9(%rip) # 0xffffffff80757402 0xffffffff80244709 <mfn_to_pfn+7>: jne 0xffffffff8024472e <machine_to_phys+51> 0xffffffff8024470b <mfn_to_pfn+9>: mov 0x57d8ef(%rip),%ecx # 0xffffffff807c2000 0xffffffff80244711 <mfn_to_pfn+15>: mov %rax,%rdx 0xffffffff80244714 <mfn_to_pfn+18>: shr %cl,%rdx 0xffffffff80244717 <mfn_to_pfn+21>: test %rdx,%rdx 0xffffffff8024471a <mfn_to_pfn+24>: jne 0xffffffff802448d4 <mfn_to_pfn+466> 0xffffffff80244720 <mfn_to_pfn+30>: shl $0x3,%rax 0xffffffff80244724 <mfn_to_pfn+34>: add 0x4c695d(%rip),%rax # 0xffffffff8070b088 0xffffffff8024472b <mfn_to_pfn+41>: mov (%rax),%rax 0xffffffff8024472e <machine_to_phys+51>: shl $0xc,%rax 0xffffffff80244732 <gnttab_dma_unmap_page+55>: and $0xfff,%ebx 0xffffffff80244738 <gnttab_dma_unmap_page+61>: mov $0xffff880000000000,%rdi 0xffffffff80244742 <gnttab_dma_unmap_page+71>: or %rbx,%rax 0xffffffff80244745 <gnttab_dma_unmap_page+74>: lea (%rax,%rdi,1),%rdi 0xffffffff80244749 <gnttab_dma_unmap_page+78>: callq 0xffffffff80024c60 <__phys_addr> 0xffffffff8024474e <unmap_single+158>: mov -0x28(%rbp),%rax 0xffffffff80244752 <unmap_single+162>: xor %gs:0x28,%rax 0xffffffff8024475b <unmap_single+171>: jne 0xffffffff802448cf <unmap_single+543> 0xffffffff80244761 <unmap_single+177>: add $0x10,%rsp 0xffffffff80244765 <unmap_single+181>: pop %rbx 0xffffffff80244766 <unmap_single+182>: pop %r12 0xffffffff80244768 <unmap_single+184>: pop %r13 0xffffffff8024476a <unmap_single+186>: pop %r14 0xffffffff8024476c <unmap_single+188>: leaveq 0xffffffff8024476d <unmap_single+189>: retq 0xffffffff8024476e <unmap_single+190>: xchg %ax,%ax 0xffffffff80244770 <do_unmap_single+0>: mov %rax,%rdx 0xffffffff80244773 <do_unmap_single+3>: lea 0x7ff(%r13),%rbx 0xffffffff8024477a <do_unmap_single+10>: mov 0x62b377(%rip),%r8 # 0xffffffff8086faf8 0xffffffff80244781 <do_unmap_single+17>: sub %rsi,%rdx 0xffffffff80244784 <do_unmap_single+20>: mov %rdx,%rsi 0xffffffff80244787 <do_unmap_single+23>: shr $0xb,%rbx 0xffffffff8024478b <do_unmap_single+27>: sar $0xb,%rsi 0xffffffff8024478f <do_unmap_single+31>: cmp $0x1,%ebx 0xffffffff80244792 <do_unmap_single+34>: movslq %esi,%rdx 0xffffffff80244795 <do_unmap_single+37>: mov %esi,%r12d 0xffffffff80244798 <do_unmap_single+40>: mov (%r8,%rdx,8),%rdi 0xffffffff8024479c <do_unmap_single+44>: jle 0xffffffff802447e0 <do_unmap_single+112> 0xffffffff8024479e <do_unmap_single+46>: lea 0x1(%r12),%edx 0xffffffff802447a3 <do_unmap_single+51>: movslq %edx,%rdx 0xffffffff802447a6 <do_unmap_single+54>: mov (%r8,%rdx,8),%rdx 0xffffffff802447aa <do_unmap_single+58>: xor %rdi,%rdx 0xffffffff802447ad <do_unmap_single+61>: test $0x7ff,%edx 0xffffffff802447b3 <do_unmap_single+67>: jne 0xffffffff802448c7 <do_unmap_single+343> 0xffffffff802447b9 <do_unmap_single+73>: mov $0x1,%edx 0xffffffff802447be <do_unmap_single+78>: jmp 0xffffffff802447d9 <do_unmap_single+105> 0xffffffff802447c0 <do_unmap_single+80>: lea (%rdx,%rsi,1),%ecx 0xffffffff802447c3 <do_unmap_single+83>: movslq %ecx,%rcx 0xffffffff802447c6 <do_unmap_single+86>: mov (%r8,%rcx,8),%rcx 0xffffffff802447ca <do_unmap_single+90>: xor %rdi,%rcx 0xffffffff802447cd <do_unmap_single+93>: test $0x7ff,%ecx 0xffffffff802447d3 <do_unmap_single+99>: jne 0xffffffff802448c7 <do_unmap_single+343> 0xffffffff802447d9 <do_unmap_single+105>: add $0x1,%edx 0xffffffff802447dc <do_unmap_single+108>: cmp %edx,%ebx 0xffffffff802447de <do_unmap_single+110>: jg 0xffffffff802447c0 <do_unmap_single+80> 0xffffffff802447e0 <do_unmap_single+112>: test %rdi,%rdi 0xffffffff802447e3 <do_unmap_single+115>: je 0xffffffff80244800 <do_unmap_single+144> 0xffffffff802447e5 <do_unmap_single+117>: test %r14d,%r14d 0xffffffff802447e8 <do_unmap_single+120>: jne 0xffffffff802448b8 <do_unmap_single+328> 0xffffffff802447ee <do_unmap_single+126>: mov $0x2,%ecx 0xffffffff802447f3 <do_unmap_single+131>: mov %r13,%rdx 0xffffffff802447f6 <do_unmap_single+134>: mov %rax,%rsi 0xffffffff802447f9 <do_unmap_single+137>: callq 0xffffffff80244640 <swiotlb_bounce> 0xffffffff802447fe <do_unmap_single+142>: xchg %ax,%ax 0xffffffff80244800 <do_unmap_single+144>: mov $0xffffffff8086fac8,%rdi 0xffffffff80244807 <do_unmap_single+151>: callq 0xffffffff8045e570 <_spin_lock_irqsave> 0xffffffff8024480c <do_unmap_single+156>: lea 0x80(%r12),%edi 0xffffffff80244814 <do_unmap_single+164>: lea (%r12,%rbx,1),%ecx 0xffffffff80244818 <do_unmap_single+168>: xor %edx,%edx 0xffffffff8024481a <do_unmap_single+170>: and $0xffffffffffffff80,%edi 0xffffffff8024481d <do_unmap_single+173>: cmp %edi,%ecx 0xffffffff8024481f <do_unmap_single+175>: jl 0xffffffff802448a0 <do_unmap_single+304> 0xffffffff80244821 <do_unmap_single+177>: sub $0x1,%ecx 0xffffffff80244824 <do_unmap_single+180>: cmp %ecx,%r12d 0xffffffff80244827 <do_unmap_single+183>: jg 0xffffffff80244841 <do_unmap_single+209> 0xffffffff80244829 <do_unmap_single+185>: mov 0x62b2b8(%rip),%rdi # 0xffffffff8086fae8 0xffffffff80244830 <do_unmap_single+192>: movslq %ecx,%rbx 0xffffffff80244833 <do_unmap_single+195>: sub $0x1,%ecx 0xffffffff80244836 <do_unmap_single+198>: add $0x1,%edx 0xffffffff80244839 <do_unmap_single+201>: cmp %ecx,%r12d 0xffffffff8024483c <do_unmap_single+204>: mov %edx,(%rdi,%rbx,4) 0xffffffff8024483f <do_unmap_single+207>: jle 0xffffffff80244830 <do_unmap_single+192> 0xffffffff80244841 <do_unmap_single+209>: sub $0x1,%r12d 0xffffffff80244845 <do_unmap_single+213>: movslq %r12d,%rcx 0xffffffff80244848 <do_unmap_single+216>: mov %rcx,%rbx 0xffffffff8024484b <do_unmap_single+219>: and $0x7f,%ebx 0xffffffff8024484e <do_unmap_single+222>: cmp $0x7f,%rbx 0xffffffff80244852 <do_unmap_single+226>: je 0xffffffff80244882 <do_unmap_single+274> 0xffffffff80244854 <do_unmap_single+228>: mov 0x62b28d(%rip),%rdi # 0xffffffff8086fae8 0xffffffff8024485b <do_unmap_single+235>: jmp 0xffffffff80244878 <do_unmap_single+264> 0xffffffff8024485d <do_unmap_single+237>: nopl (%rax) 0xffffffff80244860 <do_unmap_single+240>: add $0x1,%edx 0xffffffff80244863 <do_unmap_single+243>: sub $0x1,%r12d 0xffffffff80244867 <do_unmap_single+247>: mov %edx,(%rcx) 0xffffffff80244869 <do_unmap_single+249>: movslq %r12d,%rcx 0xffffffff8024486c <do_unmap_single+252>: mov %rcx,%rbx 0xffffffff8024486f <do_unmap_single+255>: and $0x7f,%ebx 0xffffffff80244872 <do_unmap_single+258>: cmp $0x7f,%rbx 0xffffffff80244876 <do_unmap_single+262>: je 0xffffffff80244882 <do_unmap_single+274> 0xffffffff80244878 <do_unmap_single+264>: lea (%rdi,%rcx,4),%rcx 0xffffffff8024487c <do_unmap_single+268>: mov (%rcx),%ebx 0xffffffff8024487e <do_unmap_single+270>: test %ebx,%ebx 0xffffffff80244880 <do_unmap_single+272>: jne 0xffffffff80244860 <do_unmap_single+240> 0xffffffff80244882 <do_unmap_single+274>: mov %rax,%rsi 0xffffffff80244885 <do_unmap_single+277>: mov $0xffffffff8086fac8,%rdi 0xffffffff8024488c <do_unmap_single+284>: callq 0xffffffff8045e2c0 <_spin_unlock_irqrestore> 0xffffffff80244891 <unmap_single+481>: jmpq 0xffffffff8024474e <unmap_single+158> 0xffffffff80244896 <unmap_single+486>: nopw %cs:0x0(%rax,%rax,1) 0xffffffff802448a0 <do_unmap_single+304>: mov 0x62b241(%rip),%rdx # 0xffffffff8086fae8 0xffffffff802448a7 <do_unmap_single+311>: movslq %ecx,%rbx 0xffffffff802448aa <do_unmap_single+314>: mov (%rdx,%rbx,4),%edx 0xffffffff802448ad <do_unmap_single+317>: jmpq 0xffffffff80244821 <do_unmap_single+177> 0xffffffff802448b2 <do_unmap_single+322>: nopw 0x0(%rax,%rax,1) 0xffffffff802448b8 <do_unmap_single+328>: cmp $0x2,%r14d 0xffffffff802448bc <do_unmap_single+332>: jne 0xffffffff80244800 <do_unmap_single+144> 0xffffffff802448c2 <do_unmap_single+338>: jmpq 0xffffffff802447ee <do_unmap_single+126> 0xffffffff802448c7 <do_unmap_single+343>: ud2a 0xffffffff802448c9 <do_unmap_single+345>: jmp 0xffffffff802448c9 <do_unmap_single+345> 0xffffffff802448cb <unmap_single+539>: ud2a 0xffffffff802448cd <unmap_single+541>: jmp 0xffffffff802448cd <unmap_single+541> 0xffffffff802448cf <unmap_single+543>: callq 0xffffffff8004d1a0 <__stack_chk_fail> 0xffffffff802448d4 <mfn_to_pfn+466>: mov 0x61f71d(%rip),%rax # 0xffffffff80863ff8 0xffffffff802448db <mfn_to_pfn+473>: jmpq 0xffffffff8024472e <machine_to_phys+51> End of assembler dump. (gdb) > > vmlinux can be downloaded from: http://ftp.concero.nl/pub/kernel/vmlinux
Created attachment 331975 [details] debugging patch (kernel, v2) This is a replacement patch for the one you used earlier from the other bug.
Additionally, if you could try arranging your experiments so that we could distinguish the following three cases: scp (main direction of network traffic) from Dom0 to DomU scp (main direction of network traffic) from DomU to Dom0 no network traffic at all (i.e. only disk activity) This might help further isolating the problem origin. Thanks!
Created attachment 332116 [details] log for scp (main direction of network traffic) from Dom0 to DomU > scp (main direction of network traffic) from Dom0 to DomU vmlinux can be downloaded from: http://ftp.concero.nl/pub/kernel/vmlinux-pv2 I was not able to trigger the bug while running (on domU) staging:~/usr # while true ; do tar xf ../usr.tgz ; done over last night. Furthermore a more than two hours scp to Dom0 on domU did not trigger it either. But I will test this a bit more.
Dion, Could you tested this issue withount a lvm but a file format backend to see whether this bug is reproduced. like: disk=[ 'file:/disk0,xvda,w', ] -James
Created attachment 332174 [details] debugging patch (kernel, v3) I'm sorry, you hit a bug in the debugging patch. Here's the fixed one.
(In reply to comment #17) > Dion, > Could you tested this issue withount a lvm but a file format backend to see > whether this bug is reproduced. > like: disk=[ 'file:/disk0,xvda,w', ] > > -James Hello James and Jan, With a file backed disk, until now the bug does not seem to happen (running for several hours now). So it looks like LVM is a requirement to trigger the bug. Furthermore, I have not seen error messages from the cciss driver anymore while running with the file backed domU. Now I will use the fixed patch from Jan and go back to an LVM backed domU.
Forgot to change the status
Don't do y build, yet - the change now made the condition useless. Will attach another on in a minute.
(In reply to comment #21) > Don't do y build, yet - the change now made the condition useless. Will attach > another on in a minute. Ok I stopped the build.
Created attachment 332189 [details] debugging patch (kernel, v3) This one should be better.
Created attachment 332248 [details] debugging patch (kernel, v4) Here's another version of the patch - it seems like the BUG_ON() in question still may have false positives. Hence I converted it to a warning now.
(In reply to comment #24) > Created an attachment (id=332248) [details] > debugging patch (kernel, v4) > > Here's another version of the patch - it seems like the BUG_ON() in question > still may have false positives. Hence I converted it to a warning now. Are you able to reproduce the bug? 1. I think this is really a nasty one. The WARN_ON in do_unmap_single creates loads of logging. I run it for a long time and the system kept on running. I think now the bug does not hit in because of all the latency introduced by the logging (and the probing). (I did not save all this logging) 2. I moved the WARN_ON out of the for loop: static void do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir) { unsigned long flags; int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT; phys_addr_t phys = io_tlb_orig_addr[index]; for(i = 1, count = 0; i < nslots; ++i) {//temp if((phys + (i << IO_TLB_SHIFT)) != io_tlb_orig_addr[index + i]) { printk("dus[%d]: %Lx %Lx\n", index, (unsigned long long)io_tlb_orig_addr[index + i], (unsigned long long)phys + (i << IO_TLB_SHIFT)); ++count; } } WARN_ON(count); My hope was that this would reduce the logging significantly, however this does not happen. So each time it iterates over the slots most of the time it hits the printk once and logs the warning. See log in pv4-1.txt 3. I disabled the concerning WARN_ON(count), this time reducing the log with the Call Trace dumps See log in pv4-2.txt I think the introduced probing latency is still too large for the bug to occur. (Please tell me if you are unhappy with the large attachements and if you want me to remove redundant parts)
Created attachment 332341 [details] WARN_ON moved out of the for loop See comment 25
Created attachment 332342 [details] Disabled WARN_ON See comment 25 (p4-2.txt)
Created attachment 332343 [details] Bug hit on another supermicro It happens that I have to convince myself that I am still able to trigger the bug. This bug did first manifests itself on a (new) supermicro machine with 4GB RAM when I was untarring a large tar file on a domU. However this machine is running in production now (with mem=4GB) so I cannot use it for testing anymore. I have a bit older supermicro machine with 2 CPU's and 8 GB ram, now running Xen from openSUSE 11.2. I am able to chrash the machine as well, although it takes a bit longer. I ran scp on dom0 sending data to domU + running in that domU a tar xvf. This is with the v4 patch, but I think I disabled completely the for loop probing in do_unmap_single. I will retry this with the original v4.
>Are you able to reproduce the bug? I've personally not seen the bug, and I'm not determined whether it has been reproduced in our lab. >2. I moved the WARN_ON out of the for loop: Of course it was meant to be that way. >It happens that I have to convince myself that I am still able to trigger the >bug. This bug did first manifests itself on a (new) supermicro machine with 4GB >RAM when I was untarring a large tar file on a domU. Without any network load? This would be the opposite of the result you posted in #16.
Created attachment 332450 [details] debugging patch (kernel, v5) This has three distinct fixes, and further extended logging (if any of this turns out excessive, feel free to disable again, just be sure to tell with eventual logs which parts you disabled).
Now having positive feedback from another tester, I'd definitely would want to hear back from you.
(In reply to comment #29) > >Are you able to reproduce the bug? > > I've personally not seen the bug, and I'm not determined whether it has been > reproduced in our lab. > > >2. I moved the WARN_ON out of the for loop: > > Of course it was meant to be that way. > > >It happens that I have to convince myself that I am still able to trigger the > >bug. This bug did first manifests itself on a (new) supermicro machine with 4GB > >RAM when I was untarring a large tar file on a domU. > > Without any network load? This would be the opposite of the result you posted > in #16. I will try to narrow it further down. I think some network traffic is required to trigger it while doing domU disk access. However, I was not able to get it triggered with a transfer of data from dom0 to domU:/dev/null. I think that the combination of disk access and network traffic is required.
(In reply to comment #31) > Now having positive feedback from another tester, I'd definitely would want to > hear back from you. No problem, I will start building and testing. I was ill a couple of days. Too sick to touch my keyboard.
(In reply to comment #31) > Now having positive feedback from another tester, I'd definitely would want to > hear back from you. Jan, I setup the two machines (with 8 GB and 16 GB RAM) with patch v5 and did test them with severe I/O load for more than 17 hours. Except for the aep&p logging during startup, nothing additional has been logged. Until now no unstability has been observed.
Thanks! *** This bug has been marked as a duplicate of bug 553690 ***