Bugzilla – Bug 182151
IBM Thinkpad T43 -- Suspend to RAM no longer works when ATI fglrx driver is loaded -- worked fine in 9.3 and 10.0
Last modified: 2007-03-31 15:26:57 UTC
When my laptop suspends to RAM with the ATI fglrx driver it now locks up upon resume with a garbled display. It never used to do that since ATI fixed the lockup problem a few versions back. If I unload fglrx suspend to RAM works fine. SuSE 9.3 and 10.0 both functioned properly when suspended. Suspend to disk *does* work properly in 10.1.
I can confirm that this is an existing problem on a nearly identical thinkpad, so it's not his configuration. I have tried using a hand-rolled 2.6.16.19 kernel, as well as a new xorg.conf file, neither of which affected the hang on resume. I haven't tried a custom xorg install yet. Using the 2.6.13 kernel and xorg 2.6.8 (SUSE 10) worked just fine under default config with fglrx.
Please attach /var/log/Xorg.* here. Just in case also add 300 lines of /var/log/messages.
Created attachment 87850 [details] syslog entries related to the software suspend I basically cut the portion of /var/log/messages from the start of suspend2ram until syslog was restarted when the system rebooted. I didn't see anything particularly out of the ordinary.
Created attachment 87851 [details] xorg log files Here are the /var/log/Xorg* files as requested.
Stefan, can you hel here?
Not really, only ATI can do.
ATI already fixed the problem with fglrx several releases ago. As I said, suspend to RAM worked until SuSE 10.1. So, therefore something that changed between 10.0 and 10.1 broke it.
Does the same driver work with 10.1? ATI is currently investigating an issue with CPU_HOTPLUG. It doesn't appear as though the driver is causing an issue. Since you have rolled your own kernel already, please try without CPU_HOTPLUG. Reference Novell bug #181886. I have also heard of T43's having some unique problems. I will see if I can track down more information.
I am running a SuSE rolled kernel. I can recompile without CPU_HOTPLUG and let you know what happens. And, just so you know, a co-worker of mine has a different laptop (HP) and he has reported the same behavior to me.
By the way, I get this: You are not authorized to access bug #181886.
There are no configurable options in the SuSE kernel source OR the vanilla 2.6.16.20 source related to CPU_HOTPLUG that I can find (doing a 'make menuconfig', 'less /proc/config.gz', or 'vi .config'). Is this a manual entry that needs to be inserted into the .config file (ex: CPU_HOTPLUG=n) or are you referring to a boot-time kernel option?
CONFIG_HOTPLUG_CPU is the option, apologies Through menuconfig, you must have CONFIG_SMP, CONFIG_EXPERIMENTAL and CONFIG_HOTPLUG enabled.
CONFIG_SMP, huh? So, if I'm running a uniprocessor kernel, CONFIG_HOTPLUG_CPU isn't even an option, which would suggest it is disabled in this configuration -- there's no point in trying to hotplug the only cpu you have. :) With the config you've outlined above, really the only change I'm making is enabling SMP and leaving the cpu hotplug disabled. Anyway, it's compiling now (I'm using the kernel source for the 2.6.16.13-4 kernel). I'll let you know if it makes a difference.
I am unsure if the stock Novell kernels are always built with SMP. Some distributions are reducing the number of kernels they support by stabilizing and dealing with Uniprocessor issues on the SMP paths. Perhaps a Novell employee can provide guidance on the enablement of SMP in Novell kernels.
For some time now, SuSE/Novell have provided different rpms for different kerlel types, depending on how your system is detected during install, ex: My laptop was configured with kernel-default-2.6.16.13-4 whereas my P4 at home was configured with kernel-smp-2.6.16.13-4.
SUSE still also ships non-smp kernels. This is correct.
I don't know if we're still waiting for me to check an SMP kernel with CONFIG_HOTPLUG_CPU disabled or not, but for what it's worth, it made no difference.
Has there been any progress on this bug?
(In reply to comment #15) > ex: My > laptop was configured with kernel-default-2.6.16.13-4 whereas my P4 at home was > configured with kernel-smp-2.6.16.13-4. This is because the P4 supports hyper threading. It's really difficult to say from here what's going wrong. ATi has told me that they had problems reproducing this bug. What surprises me slightly is that suspend to disk works while suspend to ram doesn't. Now suspend to disk re-POSTs the entire hardware but this causes all the 3D engine state to be tossed. I would like to ask you to do another test: what happens when you VT switch to the console before you suspend X to ram? Do you still get a lockup on resume or after you've switched back to X?
Both Matthew and I tested this with close to the same results. Both systems came back while switched to tty1 (upon resume mine had some garbled stuff at the top of the screen where the SuSE logo is, but switching to tty2 and back corrected this). However, when I switched to tty7 X was locked up. The only difference between mine and Matthew's was that mine still had an active mouse pointer, but the keyboard was dead. His was completely dead. I was able to SSH into the laptop and try killing X, which caused a quick death of my PC (I could ping, no ssh, no local terminal). Matthew could not SSH once he switched back to X.
I ran across this site regarding the Thinkpad and I am unsure what it means: http://thinkwiki.org/wiki/Problems_with_fglrx#Troubles_using_software_suspend The lines of most interest are the last two: T43 and SuSE 10.1, one using swsusp and one using Suspend to RAM Both say this: without vbetool or UseDummyXServer, with DRI enabled
This could be a int10 issue we had with out X.org packages (bug #180535, bug #170991, bug #158806). Could you please retest, as soon as updated X.org packages are out for 10.1 (presumably short after SLES10 release)?
You can find RPMs for testing in ftp://ftp.suse.com/pub/people/sndirsch/RPMS/bug182151
I get this: Could not chdir to bug182151: server said: CWD command failed. Permissions?
Could you try again? Probably it haven been synced yet when you tried it. Thanks.
Both Matt and I tried with the same result -- no difference in behavior. the system still freezes after resume.
Just an update: Today I found a bunch of updates available from YaST Online Update which I installed, specifically: kernel-default-2.6.16.21-0.13 xorg-x11-server-6.9.0-50.17 There were other packages related to xorg, but they were all based on this same version (I believe). Anyway, with every update installed and configured (also re-installed fglrx for the new kernel), the system still hangs after suspend-to-ram in exactly the same fashion as before.
Sure, it's the same udpate you tried before.
Almost -- the update I tried before was xorg-x11-server-6.9.0-50.14. Maybe there's no major changes, but the version number /is/ different. :) Anyway...
A new version of fglrx (8.27.10) was released recently, but it did not address this issue -- I'm still locking up after the upgrade.
I just installed FGLRX 8.29.06 today and re-tested -- Same behavior as before. I wanted to note something though. If I switch to tty1 before suspending, it will resume. Switching back to tty7 immediately kills it. If, after I resume in tty1 I attempt to go to init 3 the screen blanks out but the system is still running. At this point I can SSH in. If I then go to runlevel 5 again, X starts up normally without locking and the display returns to normal. Is there an easy way to perhaps install x.org 6.8 and see if it works there or would that break too many things?
I just updated my kernel to 2.6.16.21-0.25, my fglrx driver to 8.29.06, and my xorg-x11-server to 6.9.0-50.24. Nothing has changed -- still crashes as previously described. I am going to attempt an install of openSUSE 10.2 Alpha to see if this issue has been resolved thus far in the new development tree. If it is, we may want to see what happens by downgrading to xorg 6.8 or upgrading to 7.0 or 7.1 within SUSE 10.1. Just a thought...
No dice on 10.2 Alpha 5... Using kernel 2.6.18-9, fglrx 8.29.06, and xorg 7.1-27, the system locks up exactly as in previous versions. I should note that this problem again only presented itself after fglrx was installed, as I successfully suspended under the default configuration without ATI's driver. ...thoughts?
I just recently tested using a fresh install of SUSE Linux Enterprise Desktop 10. It works fine until I install fglrx, just as on the other tests. I figured it would fail here since it's basically the same as openSUSE 10.1, but still... This is your enterprise product, and a LOT of enterprises use Thinkpads. We haven't heard anything from Novell on this matter since July 24, so I'm assuming that this isn't a priority case. Can we escalate this a bit, as the bug still exists in your current development base? Has anyone besides Jarom and I been able to duplicate this problem? My final tests on this machine will be with Fedora Core 5 and Kubuntu. If either of them function properly, I may (against all better judgement) be looking at my new distro of choice.
My test with Fedora Core 5 was a success. The kernel was upgraded to 2.6.18, xorg was version 7.0, and I used fglrx 8.29.06. I noticed that Fedora seems to use a different method to suspend the system. My next test (when I get time) will be to try this different method with openSUSE. I'll report my findings when I can. Jarom will be working on it as well. I'm still a little perturbed by the lack of activity on this bug. At the very least, let us know that this is still on the radar.
I can confirm all of the reported behaviour here: Thinkpad R52, ATI Radeon X300 SuSE 10.1 Xorg 6.9 - s2ram with fglrx worked for me before, now only suspend 2 disk works - machine crashes if one tries to use an X running with fglrx that was suspended before - it works fine without X running or with X with radeon-driver - also killing the old X from the terminal (when suspended from there) and starting a new one works fine I tried some combinations of vbetool commands after waking up from suspend but nothing prevented the system from hanging when switching back to vt7 to an X that has been suspended...
Good to finally see someone else with this problem who has had it work in prior versions of SuSE. Though, it seems that Novell has dropped this bug... I verified that Fedora Core 6 works fine with s2ram. One other thing of note: I decided to try running without the fglrx drivers. s2ram works with the opensource Radeon driver but as soon as you enable DRI (so I can get acceleration) s2ram no longer works. So, it seems to not be a problem with fglrx at all, since I was able to reproduce the problem with the opensource driver. Can we get s2ram to work again with an accelerated driver??????
>I decided to try running without the fglrx drivers. >s2ram works with the opensource Radeon driver but as soon as you enable DRI >(so I can get acceleration) s2ram no longer works. We don't support DRI on X300 cards.
That's not the point -- it used to work. I want it to work again.
> That's not the point -- it used to work. Only by accident. > I want it to work again. Good luck!
Wow... nice cop-out! It almost seems that you don't care... Look, whether it was intended for it to work or not, it did at one point and the fact that it no longer does indicates that some unintended digression occurred. Therefore, the code is now broken. Why is it so hard to figure out what happened? Other distros are able to perform this task just fine. What is it with SuSE 10.1 and above that is different? Was an enhancement made that is causing this behavior? If so, what was it, and do we really need it? Does it have something to do with Xgl? Can it be undone?
Jaron, you said it works with fedora and you also mentioned that fedora has some different suspend mechanism - have you figured out the difference? So maybe one could have a look at the code and try to do something similar in suse - especially since it can not have to do anything with the kernel or the graphics driver package provided by suse (I use a vanilla kernel and the current driver from the ati website).
Jarom, it's just a matter of priorities. Features we do no support (reengineered and experimental drivers due to lack of documention), we never supported and tested in the past, do not have the highest priority you might imagine.
We need to check first if this is a driver regression (of a new driver) or a specific problem of 10.1. For this we'll try to reproduce this problem with a driver >= 8.29 on SUSE 9.3/10.0.
We might run into some troubles with compiling newer fglrx drivers for older kernels, which are used by SUSE 9.3/10.0.
First I need hardware for testing. Stefan B., do we have a T43 (not T43P!) for testing available?
I don't think we have it available at the moment. I'll take a loook.
*** Bug 229317 has been marked as a duplicate of this bug. ***
Finally I found hardware for testing. TODO (for me): see comments #44/45
Update: Neither Suspend-to-Ram nor Suspend-To-Disk work properly with openSUSE 10.2 using the 8.33.6 fglrx driver.
Update: Neither Suspend-to-Ram nor Suspend-To-Disk work properly with SUSE Linux 9.3 using the 8.33.6 fglrx driver (resume does not work).
Just tried the 8.19.10 fglrx driver version, which we shipped together with SUSE 9.3. When trying to start the Xserver I get a black screen and that's all. No need to test STD/STR. I don't think we have a driver regression here - it's the opposite. It is not a kernel bug (same problems on 9.3 and 10.2). Anyway, I think this is something which should be looked at - by ATI/AMD. I think it makes sense for ATI/AMD to support STD/STR on a T43. It's quite common AFAIK.
Interesting news: Jarom called me yesterday afternoon reporting that after upgrading to fglrx 8.34.8, suspend is working. He is still using 10.1. When I got home last night, I updated to the new driver on my T43 running 10.2 and behold -- it worked for both suspend-to-ram and suspend-to-disk. There was a leap in the air, a shout for joy, and a few tears. I don't have immediate access to any older versions (or my SLED 10 disk) so I can't test it anywhere else at the moment. Are we looking at a fix for this bug? What was done?
Tried new fglrx (8.34.8) driver on SuSE-10.2, too. Doesn't work! To be more specific, it might work (for once), but next time it for example may not susmepnd(-to-ram) at all or it may suspend and then it doesn't resume (so power-reset is required). pity! my system: IBM/Lenovo T43 (Radeon Mobility M300) SuSE-10.2 (kernel-2.6.18.2-34-default) fglrx_7_1_0_SUSE102-8.34.8-1
I am currently using the new driver and I have suspended and resumed successfully multiple times now without a reboot. The only drawback for me is that the display may become a little garbled until I switch to another tty and back to X. I can deal with that though. I'm still using SuSE 10.1 so I personally haven't tried 10.2. It looks like progress, however there is still a little work to do.
This is a well known bug of the fglrx driver. I sometimes could reestablish the correct mode setting by switching back and forth to X multiple times. YMMV
Since the initial problem has been fixed for the initial reporter(comment #57), let's finally close this bugreport as fixed.