|
Bugzilla – Full Text Bug Listing |
| Summary: | Thinkpad T23 hangs under load | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Henryk Hecht <nvbugs> |
| Component: | Kernel | Assignee: | Thomas Renninger <trenn> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | ||
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | SUSE Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Henryk Hecht
2006-12-30 00:12:03 UTC
After a little additional experimentation, the combination of apm=off acpi=off noapic maxcpus=0 nosmp (which seems redundant...) works, at least in the sense that the machine no longer hangs. Unfortunately, running with acpi=off is not at all convenient as it means foregoing S3/S4. Is there any chance this is an SMP issue? Would building a uniprocessor kernel help? ACPI worked perfectly on this laptop with 10.0 and 10.1 (and earlier with other distros), but I haven't had a chance to try it with another distribution that has a kernel newer than 10.1's, so I can't guess if this is a vanilla 2.6 kernel bug, suse kernel bug, or kernel config bug. You can try the UP kernel from ftp://ftp.suse.com//pub/people/kkeil/testing/10.2/i386 to verify this. The kernel in comment #2 works fine so far-no lockups, no need for kernel command line args, and S3 and S4 both work. Having in the meantime tried the SMP kernel quite a few times, I think the problem is that the laptop is overheating-I don't think the fans are being run properly with acpi. This would also explain why the initial installation worked: it was performed in a very cold (ca. 5 degrees centigrade) environment. Why this should be the case, I don't know. There was nothing in any of the log files to indicate such a problem. This is a somewhat serious state of affairs if it occurs with other T23s as it could cause permanent damage if it is in fact overheating. Whether it is better to make the UP kernel more generally available, try to fix ACPI in the SMP kernel, or just ignore what is probably a small set of people, I can't say. Presumably there was a good reason for making kernel-smp=kernel-default apart from saving some build time, but on this system at least, SMP+ACPI doesn't seem to work. In case this isn't convenient to address before there is a security update for the kernel, is there any difference between the UP kernel above and the SMP kernel apart form CONFIG_SMP? How about running with just "nosmp" on the kernel command line? Does that help out? No, even with just nosmp, acpi=off was required. noapic was not actually necessary as the machine seems to be blacklisted. apm=off is probably also unnecessary, but I didn't test it. maxcpus=0 I picked up from another bug report I think, and it doesn't appear to be necessary in this case. Probably nosmp acpi=off is minimal. Why nosmp does not work but the UP kernel does I cannot guess-I don't really know how a running SMP kernel booted with nosmp differs from a UP one. Probably the only additional information that I can offer is that in non-working configurations (lack of acpi=off + apparently anything, with the SMP kernel) I occassionally noticed an error message during boot. It may not be related, as the system froze whether or not the message appeared (it is also possible that I missed it, but I watched for it closely after first noticing), and I cannot supply the exact message as whatever was printing it did so in such a way that it didn't end up anywhere in /var/log or in the kernel ring buffer. The message said something about inability to load /lib/modules/.../thermal.ko and appeared fairly early in the boot process (loaded from the initrd?). I ran rpm -V on kernel-default, which was apparently unaltered. I then rebuilt the initrd, which had no effect. This information may be spurious, but it seemed interesting in light of my own overheating hypothesis. Hmm, I wonder if this is the same problem than #216205, but on the other report, the system does not freeze completely..., the rest would make sense. You can try with the broken kernel by passing max_cstate=1. If this helps, the problem should get fixed with the next update kernel which is coming out soon. It doesn't sound like bug #216205 has anything to do with this; the symptoms seem very different. In that case, there was a high load at idle, in this case there was the expected load at idle and hard lockups under real load. I'm afraid I can't do any further testing on this at the moment. This bug was reported two months ago, and as no activity was evident and I had exhausted my own troubleshooting options, I loaded the UP kernel on the machine, removed the SMP kernel, disabled kernel updates, and sent the laptop out after a couple weeks. Nevertheless, I will try to provide the requested information at the earliest opportunity, but I am unsure when that will be possible as the laptop is now several thousand kilometers distant, and the user is unlikely to want to deal with it right now. Closing for now. Please reopen if you still see this. If you want the machine running fine with 10.3, you should give the latest Alpha/Beta version a test. If you still see this, also try to disable cpufreq, set: CPUFREQ_ENABLED="no" in /etc/sysconfig/powersave/cpufreq |