|
Bugzilla – Full Text Bug Listing |
| Summary: | Restart Xserver not possible after upgrading the driver version | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Stefan Dirsch <sndirsch> |
| Component: | X11 3rd Party | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED FIXED | QA Contact: | Stefan Dirsch <sndirsch> |
| Severity: | Enhancement | ||
| Priority: | P1 - Urgent | CC: | aritger, eich, lfriedman, werner |
| Version: | Beta 2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
xdm.diff
kdm.diff gdm.diff Xorg diff /usr/bin/X.%name |
||
|
Description
Stefan Dirsch
2006-11-09 16:23:04 UTC
Stefan, I'm not sure that I understand how you updated the nvidia driver. The problem that you're reporting shouldn't even be possible if you used the official nvidia driver package. Were you using an RPM for the nvidia driver? thanks Yes, I used the driver RPMs we build for SLES10/SLED10. Background: We would like to provide a driver update due to the "Rapid7 Advisory R7-0025" security issue. Did you understand the problem or was my decsription to confusing for you? The official NVIDIA installer already unloads the nvidia kernel module as part of the installation process. It sounds like your RPM isn't doing that (probably because the RPM can be installed while X is using it?). I guess I'm just not clear what you're asking us to change? > The official NVIDIA installer already unloads the nvidia kernel module as > part of the installation process. Sure. I'm aware of. Makes sense. > It sounds like your RPM isn't doing that > (probably because the RPM can be installed while X is using it?). That's it! :-) > I guess I'm just not clear what you're asking us to change? During Xserver start unload already loaded kernel module, when it does not match the driver version, and load a possibly updated kernel module. You're right. When using the official installer, such a feature is not required at all. Oops. I think we can do a braindead solution to address this problem ourselves by forcing to always unload it first.
cat >> /etc/modprobe.de/nvidia < EOF
install nvidia { /sbin/rmmod nvidia; sbin/modprobe -i nvidia }
EOF
Thanks Stefan. Let me know if you need anything further from NVIDIA. > Oops. I think we can do a braindead solution
It's braindead since it doesn't work at all. This would only work when the driver always tries to load the kernel module, but it doesn't. So now we're back to
"During Xserver start unload already loaded kernel module, when it does not
match the driver version, and load a possibly updated kernel module."
I've opened RFE 266558 for this issue. thanks. Thanks. Just for curiosity. What does RFE stand for? Request for enhancement Thanks, Lonni. Matthias has access to the RFE database and will watch this issue. :-) This RFE has been marked will not fix, as the requested enhancement is only neccesary with 3rd party repackaging of the NVIDIA driver. So NVIDIA is not interested in easy driver integration into Linux products? This is definitely bad news. :-( > So NVIDIA is not interested in easy driver integration into Linux products?
> This is definitely bad news. :-(
I'm sorry for the miscommunication, Stefan. After all the great work SuSE
and NVIDIA has done together, we both know the above statement isn't true.
Several NVIDIA engineers discussed this bug a while ago, and the consensus
was that this was cumbersome to solve within the driver:
- we wouldn't want running *any* user space driver component to try
to unload the kernel module, so probably limit this approach to just
the X driver, but what about non-X components? libnvidia-cfg.so
(used by `nvidia-xconfig --query-gpu-info`)? what about the
upcoming CUDA compute driver?
- unloading would need to be done in userspace rather than in the
kernel, but the user space portion doesn't have easy access to the
module usage count, so it would require some more complicated
handshaking to negotiate when it was safe to unload (or atleast
when an unload is not expected to succeed)
- if we wanted to backport this to the 71xx legacy drivers, we'd
have problems solving this in nvidia_drv.o because the driver is
inside the XF86 libc wrapper; that's solvable, but cumbersome.
It feels to me like the problem you encountered here would be better
solved through package management. I remember when we first developed
nvidia-installer, you suggested having nvidia-installer not let you
install while the X server was running (I think you found problems with
the nv driver running while the NVIDIA kernel module was loaded).
In general, it seems cleaner to need to first uninstall the current
NVIDIA driver before installing the new driver. And I think that's
our recommendation.
Of course NVIDIA's willing to work with you on this, but it doesn't seem
like having the X driver unload the kernel module is the right solution.
FWIW, NVIDIA does still need to solve the problem of allowing multiple
driver versions to coexist on the file system at once, even if only one
is allowed to be used at a time (we receive this request from various
distros building LiveCDs), so it may be that as we work through the
issues for that, some better solutions to this current problem may arise.
But I think the best course of action is for your rpm packaging to more
strictly enforce when the driver can be installed, similar to what
nvidia-installer does.
Let me know what I can do to help.
Thanks,
- Andy
Thanks for detailed explanation, Andy. I'll try to unload the kernel module right after Xsession has finished, i.e. before the new Xserver starts. Not sure how to tell this exactly xdm/kdm/gdm. Need to figure out. Of course the unload will fail when a second Xsever is running. Bad luck. The kernel rpm asks the user to reboot the system after installation to make sure the upgraded kernel and its modules are loaded. So this is a non issue for all free drivers. In this particular case the situation is a little more complicated as a user land component has a dependency on specific kernel driver user space/ kernel ABI version. I'm not exactly sure how we solve this problem with user space components /drivers we control (ie. the free ones) but I think we try to avoid upgrading these products to a new ABI for an existing product (Stefan?). There are different ways to solve this: 1. Make the NVIDIA kernel/user land interfaces backward compatible on both sides (this would require the new X driver to run with an old kernel module - possible with a degraded feature set). 2. force the user to reboot the system whenever ugrading an NVIDIA component. Juding from Andy's comment above this seems to be the only way to gurantee a consistent system under all circumstances although it seems to be overkill in most real world use cases. I'd assume that the former solution would create a temendous burdeon and possible introduce regression for a feature of temorary value while the latter approach creates an inconvenience for the user. This inconcenience could be reduced if the reboot would only be required if there really was an ABI change between kernel and user land (assuming this is not always the case). I propose to simply hack xdm/kmd/gdm to unload the kernel module right after terminating the Xserver. I don't have a better idea right now. Trying to do this in /etc/X11/xdm/Xreset doesn't work since the Xserver is still running at this time. :-( Created attachment 111497 [details]
xdm.diff
Proposal for xdm.
It would be better to unload the module only when the version of the currently loaddd kernel module differs from the one, which is currently installed. You can check the version of the currently loaded kernel module via "cat /proc/driver/version/nvidia", but the version from the installed one cannot be checked with modinfo. Looks like it's only available via the "strings" command. :-( Created attachment 111504 [details]
kdm.diff
Proposal for kdm (untested).
Created attachment 111512 [details]
gdm.diff
Proposal for gdm (untested).
I'm against hacking kdm/xdm to address the few times where an update happened. This is even more hacky than putting it into the driver. It doesn't address the cases where no X driver is running (as mentioned in Andy's mail).
Also it will always return false (unless someone manages to install an Nvidia and a FireGL card).
The install procedure can test if it can unload the driver instead and reboot if this fails.
How about if we create a script to replace the /var/X11R6/bin link when a driver is installed:
Put the following into /var/X11R6/bin/X.install:
#/bin/bash
mods=`lsmod | grep -o -E "nvidia|fglrx"`;
test $? -eq 0 ] && for i in ${mods}; do rmmod $i; done;
test -f /var/X11R6/bin/X._install && mv /var/X11R6/bin/X._install /var/X11R6/bin/X;
exec /var/X11R6/bin/X ${1+"$@"}
The installation procedure can do someting like:
test -f /var/X11R6/bin/X._install \
|| { mv /var/X11R6/bin/X /var/X11R6/bin/X._install; \
ln -sf /var/X11R6/bin/X.install /var/X11R6/bin/X' };
This will of course not address Andy's other concerns. But it will keep actions confined to where they are required. > The install procedure can test if it can unload the driver instead and > rebootif this fails. IMHO this is overkill and I'm afraid there's no way to trigger a reboot, i.e. to tell YaST to do so, in a safe way during a package installation. > How about if we create a script to replace the /var/X11R6/bin link when a > driver is installed: > Put the following into /var/X11R6/bin/X.install: > #/bin/bash > mods=`lsmod | grep -o -E "nvidia|fglrx"`; > test $? -eq 0 ] && for i in ${mods}; do rmmod $i; done; > test -f /var/X11R6/bin/X._install && mv /var/X11R6/bin/X._install > /var/X11R6/bin/X; > exec /var/X11R6/bin/X ${1+"$@"} This would be possible, but ... > The installation procedure can do someting like: > > test -f /var/X11R6/bin/X._install \ > || { mv /var/X11R6/bin/X /var/X11R6/bin/X._install; \ > ln -sf /var/X11R6/bin/X.install /var/X11R6/bin/X' }; What do you mean with installation procedure? - YaST? - RPM %post install section? At least SaX2 and (/etc/X11/xdm/)SuSEconfig.xdm then will needs to be adjusted as well (otherwise the wrapper script would be overwritten). My first idea has also been a wrapper script, but I abandonned it immediately when thinking about all the places where changes have to be made. When using a wrapper script I vote for replacing /usr/bin/Xorg (instead of /var/X11R6/bin/X) and moving /usr/bin/Xorg to /usr/bin/Xorg.bin. Then we only need to replace package xorg-x11-server (and package permissions on SLES10). Proposal for new /usr/bin/Xorg (untested):
#!/bin/sh
# force kernel module reload after driver update; kernel module and
# X driver version need to match (Novell Bugzilla #219456)
if [ "$UID" == "0" ]; then
mods=$(lsmod | grep -o -E "^(nvidia|fglrx)")
if [ $? -eq 0 ]; then
for m in ${mods}; do
/sbin/rmmod $m
done
fi
fi
exec /usr/bin/Xorg.bin ${1+"$@"}
Created attachment 111584 [details]
Xorg
Improved Xorg wrapper.
I'm not sure if I have made myself clear. My idea was that the installation program for the nvidia/fireglx driver *temporarily* replaces the link of the Xserver binary. Once the binary has run sucessfully it will itself move back the link. Unloading the kernel module every time will trigger complaints by people who are concerned about short startup times as it will require the Xserver to reload the kernel module every time it is restarted. Maybe I'm missing someting but I don't see any benefit unloading this module every time the Xserver gets restarted. I only see a need in the particular situation where the X and kernel drivers have been updated and the loaded kernel driver doesn't match what is needed by the X driver any more. I don't care if you want to put the 'wrapper' into /usr/bin/X11. I just don't want to see it used all the time. However my idea was that the installer itself can provide this script. It is confusing and people who don't know about the nvidia/fireglx situation (and don't care about either driver) will wonder what we are up to. Thanks for explanation, Egbert. Now I understand what you want to achieve. :-) A wrapper script /var/X11R6/bin/X, which unloads the possibly existing old kernel module and replaces itself with a symlink to /usr/bin/Xorg again after the first run. Hmm ... sounds cool ... but also problematic ... Things like this could be done in %post of a package, but I'm afraid that SaX2 will kill this mechanism (overwrite the wrapper script), when it is used for configuration on top of a running Xserver right after installation of this package. I'll need to think about a solution based on your idea, which will work in all cases. To prevent the SaX2 prolematic it would be possible to use /usr/bin/Xorg as wrapper script (untested). /usr/bin/Xorg needs to be moved to /usr/bin/Xorg.bin before in %post. But this won't work, when /usr is readonly. :-(
#!/bin/sh
# force kernel module reload after driver update; kernel module and
# X driver version need to match (Novell Bugzilla #219456)
if [ "$UID" == "0" ]; then
mods=$(lsmod | grep -o -E "^(nvidia|fglrx)")
if [ $? -eq 0 ]; then
for m in ${mods}; do
/sbin/rmmod $m
done
fi
fi
mv /usr/bin/Xorg.bin /usr/bin/Xorg
exec /usr/bin/Xorg ${1+"$@"}
Is there a need for SaX2 to still touch the link? Also is this a realistic scenario? When the nvidia kernel module is laoded and cannot be unloaded the user would already have a running Xserver with NVIDIA driver (from where he is doing the update). In all other cases the installation procedure would be able to uninstall the module. I personally don't like to replace the Xorg binary with a wrapper script - not even temporarily. Also if w ecan get SaX2 to unload the driver module (using some 'flag' that's been set up by the installation procedure) it can do with the link anything it likes. > Is there a need for SaX2 to still touch the link? I'm not sure. Marcus? > When the nvidia kernel module is laoded and cannot be unloaded the user > would already have a running Xserver with NVIDIA > driver (from where he is doing the update) This bugreport is about this situation. .-( > I personally don't like to replace the Xorg binary with a wrapper script - > not even temporarily. I do understand this. I'm not happy with it either. > Also if we can get SaX2 to unload the driver module (using some 'flag' > that's been set up by the installation procedure) it can do with the link > anything it likes. Sure, but we are talking about the situation, where it isn't possible to unload the module, because it runs on top of a NVIDIA Xserver. > I personally don't like to replace the Xorg binary with a wrapper script -
> not even temporarily.
Hmm ... we could replace the /usr/bin/X symlink instead.
(In reply to comment #35) > > Also if we can get SaX2 to unload the driver module (using some 'flag' > > that's been set up by the installation procedure) it can do with the link > > anything it likes. > Sure, but we are talking about the situation, where it isn't possible to unload > the module, because it runs on top of a NVIDIA Xserver. > This would only be the case if SaX was running inside the X session where the installation took place. Why would the user do that? Anyway if Marcus was willing to touch SaX2 there are two ways to remedy the problem. 1. SaX2 detects the situation and refuses to run - opening a window telling the user to restart X first. 2. SaX2 would detect the situation and do something smart about the link ie. changing the backup copy of the installer link - which could cause problems if multiple stacked backup copies existed or better assume the link is correct and leave it alone. (In reply to comment #36) > > I personally don't like to replace the Xorg binary with a wrapper script - > > not even temporarily. > Hmm ... we could replace the /usr/bin/X symlink instead. > This would not allow for a solution that gets installed by the update process and deinstalls itself when not needed any more. The merit of the /var/X11R6/bin/ solution was that this is guaranteed to be writable even after the installation. We are trying to design our installation procedure here and SaX2 is part of this (and under our control) therefore why not make it aware of this procedure? Indeed I can see scenarios where my solution would not work well (ie. centralized installation scenarios where /usr is shared among many systems) however these systems need other heuristics to deal with changes to the individual /var copies. No such scenario is presently supported as far as I know. My biggest concern about replacing /var/X11R6/bin/X is that we also need to adjust SuSEconfig.xdm (xorg-x11 package). Otherwise /var/X11R6/bin/X will be immediately overwritten at the end of installation (SLES10) / during (re)start of xdm (openSUSE >= 10.2). But of course changing this would be pretty trivial. BTW, maybe we even won't need to replace /var/X11R6/bin/X if we set $DISPLAYMANAGER_XSERVER in /etc/sysconfig/displaymanager appropriately during installation and call SuSEconfig.xdm afterwards. The wrapper script could set $DISPLAYMANAGER_XSERVER back after the first run. Looks like SaX2 doesn't create the /var/X11R6/bin/X symlink. At least it doesn't when started on top of a running Xserver. This is good. :-) (In reply to comment #37) > > Sure, but we are talking about the situation, where it isn't possible to unload > > the module, because it runs on top of a NVIDIA Xserver. > > > This would only be the case if SaX was running inside the X session where the > installation took place. Why would the user do that? Hm - why *wouldn't* he do that? It's standard in about every OS to download and install a new graphics driver while it (the old version) is running. (In reply to comment #38) > This would not allow for a solution that gets installed by the update process > and deinstalls itself when not needed any more. The merit of the > /var/X11R6/bin/ solution was that this is guaranteed to be writable even after > the installation. While I do not think that supporting exactly this scenario in cases where /usr is mounted r/o only (e.g. clusters with shared /usr), /var is certainly the better place for this. /usr should be immutable after installation. (In reply to comment #39) > BTW, maybe we even won't need to replace /var/X11R6/bin/X if we set > $DISPLAYMANAGER_XSERVER in /etc/sysconfig/displaymanager appropriately > during installation and call SuSEconfig.xdm afterwards. The wrapper script > could set $DISPLAYMANAGER_XSERVER back after the first run. I've now implemented this. Works fine for me. I'll attach the diff for the specfile and the wrapper script JFYI. Created attachment 111853 [details]
diff
Patch for specfile.
Created attachment 111855 [details]
/usr/bin/X.%name
wrapper script
I plan to use the new mechanism (see comments #43/44) for future NVIDIA/ATI driver releases. Closing as fixed. (sorry for joining the discussion late, just got back from holiday) Wow, I'm sorry that this got so involved. It sounds like you have a solution that works, and I don't have any better ideas. I suppose one other approach might be to defer installation of the new driver until the next time the X server starts -- either when the user reboots or when the user restarts X. At that point (after the X server scripts have started, but before the server binary has started), the old driver could be uninstalled and the new driver could be installed, then server startup could continue. Maybe the X startup scripts could be made to cheaply determine if there is a new driver to install? (e.g., look in a special directory) Anyway, just throwing out the idea; if your current solution works, then fine with me. Unfortunately defering a package installation is not possible ATM. Cool idea nevertheless. :-) I think we have a working solution now. :-) At least for most of the cases. It cannot work if more than one Xserver is active. The installation cannot be deferred but the files could be installed in a scratch area and moved into place later. This however may fail when /usr is mounted ro as at the time the Xserver is restarted it may already be mounted ro (while it must be rw at installation time.) In any case it would be just as troublesome as it would also require a wrapper script. Another caveat: this procedure only works with xdm - it doesn't work with startx. The advantage of the /var/X11R6/bin/X solution was that it would have worked with any X starting mechanism. In any case I will reopen this as the startx script may need some additional tweaking. A reply to comment #41: > Hm - why *wouldn't* he do that? It's standard in about every OS to download and > install a new graphics driver while it (the old version) is running. SaX2 isn't needed for the installation of the package. When the user is running X when installing the driver he either has a configured Xserver (with the NVIDIA driver) running already or he is installing from a server that doesn't use the NVIDIA driver in which case the kernel module is not a concern. Opps, too fast. Didn't hit reopen right. startx is only for the experienced user and no longer supported since a long time. I don't plan to handle this scenario. If you want to provide a new startx patch, let me know. BTW, it's much more easier to unload the kernel module when you use startx anyway. :-) |