|
Bugzilla – Full Text Bug Listing |
| Summary: | Kernel panic with 2.6.18.8-0.1-xen | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Greg Riedesel <greg> |
| Component: | Xen | Assignee: | Jan Beulich <jbeulich> |
| Status: | RESOLVED FIXED | QA Contact: | Jason Douglas <jdouglas> |
| Severity: | Normal | ||
| Priority: | P5 - None | ||
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
serial console capture of the panic
Another serial-console capture of the panic Output of 'lspci' under the non-Xen kernel (2.6.18.8-0.1-default) Output of "lscpi -n" in the non-Xen kernel (2.6.18.8-0.1-default) boot.msg file for the standard kernel boot. The "lsmod" output from the default/standard kernel |
||
|
Description
Greg Riedesel
2007-03-13 21:20:45 UTC
Created attachment 124157 [details]
serial console capture of the panic
Created attachment 124159 [details]
Another serial-console capture of the panic
This bug does not affect the 'default' kernel. Only the 'xen' kernel. Doing a diff of defconfig.default and defconfig.xen gives this tidbit:
1914c1845
< CONFIG_AGP_INTEL=m
---
> CONFIG_AGP_INTEL=y
I don't know enough about Xen kernel configuration to know why Intel AGP is being hard-loaded into the kernel, but this would explain why the default kernel doesn't show the problem.
Jan, this is an openSuse 10.2 bug entry. While this config selection wasn't intended to be that way, it also wasn't changed after the original release, so you had the driver built in there, too. I'm surprised this worked for you. (Any chance you have a boot.msg obtained with the old kernel?) Jason/Lynn, any chance we have a machine (Intel chipset driven by intel-agp and 4Gb+ of memory) in the lab this can be reproduced on? Regardless of that I think I found two issues with the code: - the use of GFP_DMA32, assuming the machine address will result in memory below 4G (which isn't true under Xen) - arithmetic extending across page boundaries on values returned from virt_to_gart() (the physical<->machine relationship isn't contiguous under Xen) Please also provide output of lspci and lspci -n (obtained from the native kernel). (In reply to comment #6) > Jason/Lynn, any chance we have a machine (Intel chipset driven by intel-agp and > 4Gb+ of memory) in the lab this can be reproduced on? We probably have one but as this is for opensuse it is a low priority for us right now. We may have a chance to get to it inthe middle of next week. Created attachment 127892 [details]
Output of 'lspci' under the non-Xen kernel (2.6.18.8-0.1-default)
Created attachment 127893 [details]
Output of "lscpi -n" in the non-Xen kernel (2.6.18.8-0.1-default)
> While this config selection wasn't intended to be that way, it also wasn't > changed after the original release, so you had the driver built in there, too. > I'm surprised this worked for you. (Any chance you have a boot.msg obtained > with the old kernel?) Bug #227324 describes some of the problem I had with the Final kernel (2.6.18.2-34) series. In that case "agp=off" also seemed to bypass the problems, though I did have luck using the modprobe blacklist. It was the agpgart problems that had me keep the 2.6.18.2-23-Xen kernel after 10.2 released, as that kernel didn't seem to have the same problem. So with native not working (without agp=off or blacklisting intel-agp, as the referenced bug #227324 described), this is not really a Xen bug but a generic issue; it just happens that under the Xen kernel, due to intel-agp inadvertently being built in, you can't use the blacklisting method but have to use agp=off. Nevertheless, I believe looking closely at this code has revealed a number of weaknesses on the Xen side. Bug 271573 is a duplicate of this one, with newer code. Kernel 2.6.18.8-0.3 still has this issue. The standard kernel does not show the problem, but the xen kernel does. As with the earlier one, adding agp=off to the kernel options bypasses this bug. Created attachment 146580 [details]
boot.msg file for the standard kernel boot.
THis is the boot.msg file for a standard kernel boot of 2.6.18.8-0.3-default. THis had no issues.
*** Bug 271573 has been marked as a duplicate of this bug. *** That would mean bug 227324 is no longer applicable. Based on that bug's history, however, I would think that you just happen to (not) see the problem in different kernel versions depending on other characteristics of the respective kernel build. Please clarify whether intel-agp is being loaded in the native kernel (via lsmod output), as the boot.msg provided seems to indicate that it is not being loaded at all (missing the "Detected an Intel ... Chipset." message), which makes me assume that its loading is still being suppressed by some means. The problem appears to still exist in the 2.6.18.8-0.5-xen build. Once again, using the "agp=off" option in the Boot Options gets past the Kernel Panic. Created attachment 151330 [details]
The "lsmod" output from the default/standard kernel
The 2.6.18.8.0-0.5 "defconfig.xen" and "defconfig.default". files have the same difference I mentioned in comment 4. Specifically, when I diff the two I get this in the stream: [/usr/src/linux/arch/x86_64 # diff defconfig.default defconfig.xen] 1914c1845 < CONFIG_AGP_INTEL=m --- > CONFIG_AGP_INTEL=y Which tells me that the default kernel has intel_agp as a module, and the Xen kernel has intel_agp static in the kernel. The lsmod output for the default kernel does not show "intel_agp" loaded. So in order for intel-agp to do anything, it must find matching hardware in your system, and hence the same matching hardware would be found during a native kernel boot. If intel-agp isn't loaded in the latter case, then it means you're suppressing its loading by some means. If such is necessary for your system to work, then agp=off is the way to go in my opinion; you'd have to live with the fact that the disabling needs to be done differently for the Xen and the native kernels. 10.3 will have intel-agp as a module. |