|
Bugzilla – Full Text Bug Listing |
| Summary: | NVIDIA driver after update 440.100 --> 450.57 fails due to remaining old kernel modules | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Matthias Bach <marix> |
| Component: | X11 3rd Party Driver | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED FIXED | QA Contact: | Stefan Dirsch <sndirsch> |
| Severity: | Normal | ||
| Priority: | P3 - Medium | CC: | george.spiggott, jamesrome, marix, valurolafsson, vkrevs |
| Version: | Leap 15.2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | Result of nvidia-bug-report.sh | ||
|
Description
Matthias Bach
2020-07-16 11:12:51 UTC
Seems the kernel module build of 450 failed or the 440 module is being preferred for some reason. I suggest to uninstall nvidia-gfxG05-kmp-default package, remove all remaining nvidia modules below /lib/modules: cd /lib/modules find . -name nvidia*.ko -print | xargs rm and then reinstall nvidia-gfxG05-kmp-default package. Check then this: find /lib/modules -name nvidia*.ko And if it still doesn't work also attach the result when running nvidia-bug-report.sh Dear Stefan, I too have been hit with the same issue as above and you solution worked for me. Thanks Created attachment 839785 [details]
Result of nvidia-bug-report.sh
Sadly this didn't fix the issue for me.
One interesting thing I noted: Before removing the modules I had a /lib/modules//5.3.18-lp152.20.7-default/updates/nvidia.ko, along with many modules for LEap 15.1 and 15.2 kernels.
After removing all modules and running the driver installation I have /lib/modules//5.3.18-lp152.19-default/updates/nvidia.ko. So I did actually have a module with a higher version number lying around.
I have this same issue. in 15.2. I get no graphics at all. e NVidia drivers got updated. Now I cannot activate them with # prime-select nvidia It says it cannot query the GPU. I uninstalled and reinstalled the packages, and prime-select still fails. Help please. Can we delete all the 4.4 and 4.12 files in /lib/modules? (In reply to James Rome from comment #6) > Can we delete all the 4.4 and 4.12 files in /lib/modules? And, I do not have an nvidia file in /lib/modules: drwxr-xr-x 1 root root 14 Aug 18 2018 4.12.14-lp150.12.10-default drwxr-xr-x 1 root root 14 Oct 8 2018 4.12.14-lp150.12.13-default drwxr-xr-x 1 root root 14 Oct 16 2018 4.12.14-lp150.12.16-default drwxr-xr-x 1 root root 14 Nov 7 2018 4.12.14-lp150.12.19-default drwxr-xr-x 1 root root 14 Dec 15 2018 4.12.14-lp150.12.22-default drwxr-xr-x 1 root root 14 Jan 17 2019 4.12.14-lp150.12.25-default drwxr-xr-x 1 root root 14 Feb 19 2019 4.12.14-lp150.12.28-default drwxr-xr-x 1 root root 24 Aug 7 2018 4.12.14-lp150.12.4-default drwxr-xr-x 1 root root 14 Apr 12 2019 4.12.14-lp150.12.45-default drwxr-xr-x 1 root root 14 May 16 2019 4.12.14-lp150.12.48-default drwxr-xr-x 1 root root 14 May 27 2019 4.12.14-lp150.12.58-default drwxr-xr-x 1 root root 14 Jun 17 2019 4.12.14-lp150.12.61-default drwxr-xr-x 1 root root 14 Aug 18 2018 4.12.14-lp150.12.7-default drwxr-xr-x 1 root root 14 Sep 22 2019 4.12.14-lp151.28.10-default drwxr-xr-x 1 root root 14 Oct 10 2019 4.12.14-lp151.28.13-default drwxr-xr-x 1 root root 14 Oct 30 2019 4.12.14-lp151.28.16-default drwxr-xr-x 1 root root 14 Nov 13 2019 4.12.14-lp151.28.20-default drwxr-xr-x 1 root root 14 Dec 9 2019 4.12.14-lp151.28.25-default drwxr-xr-x 1 root root 14 Mar 8 10:06 4.12.14-lp151.28.32-default drwxr-xr-x 1 root root 14 Mar 25 18:30 4.12.14-lp151.28.36-default drwxr-xr-x 1 root root 14 Jul 16 2019 4.12.14-lp151.28.4-default drwxr-xr-x 1 root root 14 Apr 20 11:14 4.12.14-lp151.28.40-default drwxr-xr-x 1 root root 14 Jun 11 15:02 4.12.14-lp151.28.44-default drwxr-xr-x 1 root root 14 Jul 3 10:36 4.12.14-lp151.28.48-default drwxr-xr-x 1 root root 14 Jul 3 12:44 4.12.14-lp151.28.52-default drwxr-xr-x 1 root root 14 Aug 11 2019 4.12.14-lp151.28.7-default drwxr-xr-x 1 root root 278 Jul 30 2017 4.4.27-2-default drwxr-xr-x 1 root root 278 May 26 2018 4.4.76-1-default drwxr-xr-x 1 root root 292 Jul 16 12:53 5.3.18-lp152.19-default drwxr-xr-x 1 root root 292 Jul 16 12:53 5.3.18-lp152.19-preempt drwxr-xr-x 1 root root 462 Jul 16 12:53 5.3.18-lp152.20.7-default drwxr-xr-x 1 root root 292 Jul 15 18:23 5.3.18-lp152.20.7-preempt drwxr-xr-x 1 root root 484 Jul 16 12:53 5.3.18-lp152.26-default drwxr-xr-x 1 root root 314 Jul 15 18:19 5.3.18-lp152.26-preempt I wish this was editable. There are NVidia modules in /lib/modules/5.3.18-lp152.19-preempt/updates. But surely /lib/modules/5.3.18-lp152.26-preempt/updates would be newer, but nothing is there. (In reply to Matthias Bach from comment #4) > Sadly this didn't fix the issue for me. I just realised I failed. I only ran `find /lib/modules -name nvidia.ko -delete`. Will retry with `find /lib/modules -name nvidia.ko -delete`. (In reply to Matthias Bach from comment #9) > (In reply to Matthias Bach from comment #4) > > Sadly this didn't fix the issue for me. > > I just realised I failed. I only ran `find /lib/modules -name nvidia.ko > -delete`. Will retry with `find /lib/modules -name nvidia.ko -delete`. So doing this properly does fix the issue. Thanks! Still weird that I had /lib/modules/5.3.18-lp152.20.7-default/updates/nvidia*.ko though when the current package builds /lib/modules/5.3.18-lp152.19-default/updates/nvidia*.ko which now gets linked from /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia*.ko. (In reply to Matthias Bach from comment #10) > So doing this properly does fix the issue. Thanks! Good! > Still weird that I had > /lib/modules/5.3.18-lp152.20.7-default/updates/nvidia*.ko though So I assume these were the 440.110 ones still, which weren't removed during uninstallation of old package for some reason. > when the current package builds > /lib/modules/5.3.18-lp152.19-default/updates/nvidia*.ko That's correct. > which now gets linked from > /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia*.ko. That's how it is supposed to be. Create symlinks for all kernels sharing the same kABI. Our weak-updates concept. @James Rome Please follow instructions of comment#1. They make sure nothing is left below /lib/modules. Yes, using find /lib/modules -name nvidia*.ko -delete and removing and reinstalling the drivers fixed it. Ok. So at least we have a workaround. But now I'm afraid this happens for everyone for this update 440.100 --> 450.57. :-( Now I know what happens. Up to 440.100 mistakenly kernel modules were rebuilt and installed for the kernel, against it has been locally built. Currently this is 5.3.18-lp152.20.7. With 450.57 I switched this back to our weak-modules concept, i.e. kernel modules are installed to a fixed kernel version (here: 5.3.18-lp152.19; even if it doesn't exist on the system), then weak-modules symlinks are created for all other installed kernels. Example 440.100 packages 450.57 packages ----------- .19 fixed GA Kernel no kernel moules 450.57 modules .20 build kernel 440.100 modules 440.100 modules (no weak symlinks created) *** .85 another kernel no kernel modules weak symlinks to .19 fixed kernel (450.57 modules) *** because modules with the same name already exist As a fix I could remove the old modules before installing the new ones. Fixed and pushed packages towards nvidia. Consider this a reliable workaround as long as this update is not available yet: rpm -e nvidia-gfxG05-kmp-default --nodeps find /lib/modules -name nvidia*.ko -delete zypper in nvidia-gfxG05-kmp-default Fixed packages contain the following RPM changelog: Thu Jul 16 19:36:52 UTC 2020 - Stefan Dirsch <sndirsch@suse.com> - remove still existing old kernel modules during installation of new modules, since otherwise weak-modules doesn't work (boo#1174204) |