|
Bugzilla – Full Text Bug Listing |
| Summary: | Nvidia rpm fails to build nvidia.ko, yet claims success | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Carlos Robinson <carlos.e.r> |
| Component: | X11 3rd Party Driver | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED FIXED | QA Contact: | Stefan Dirsch <sndirsch> |
| Severity: | Enhancement | ||
| Priority: | P4 - Low | CC: | carlos.e.r, stschoettl |
| Version: | Leap 15.0 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | |||
| Bug Blocks: | 1145316 | ||
| Attachments: | var/log/zypp/history section | ||
|
Description
Carlos Robinson
2019-03-29 17:20:46 UTC
Notice that the rpm was built at openSUSE. It is not a driver problem, but an rpm issue, thus the correct report place is this. http://http.download.nvidia.com/opensuse/README So you made an update of the NVIDIA RPMs or what exactly? Which kernel are you currently running? --> "uname -r' Which kernel packages are installed? --> rpm -qa | grep kernel Which nvidia packages are installed? --> rpm -qa | grep -i nvidia First I installed the 15.1 kernel and 15.1 Nvidia driver, which failed (Bug 1131029). That Nvidia rpm did not inform of the failure to build the kernel module. Then I reverted. I uninstalled 15.1 kernel and Nvidia modules, and forced reinstall of 15.0 Nvidia modules. The Nvidia rpm again failed to build the kernel module (for a different reason) but kept silent about the error. I found out the error and solved it - basically the kernel rpms of 15.0 have to be reinstalled to create the appropriate symlinks. The problem reported here, however, is the failure of the Nvidia rpms to abort on Make error and report the issue. It continues installing dracut and claims success - when it failed. Why Make failed is irrelevant now. cer@Telcontar:~> uname -r 4.12.14-lp150.12.48-default cer@Telcontar:~> rpm -qa | grep kernel kernel-source-4.12.14-lp150.12.45.1.noarch texlive-l3kernel-2017.133.svn44483-lp150.5.4.noarch kernel-source-4.12.14-lp150.12.48.1.noarch texlive-l3kernel-doc-2017.133.svn44483-lp150.5.4.noarch kernel-syms-4.12.14-lp150.12.48.1.x86_64 kernel-devel-4.12.14-lp150.12.48.1.noarch kernel-firmware-20190118-lp150.2.12.1.noarch kernel-devel-4.12.14-lp150.12.45.1.noarch kernel-default-devel-4.12.14-lp150.12.48.1.x86_64 kernel-docs-4.12.14-lp150.12.48.1.noarch nfs-kernel-server-2.1.1-lp150.4.6.1.x86_64 kernel-macros-4.12.14-lp150.12.48.1.noarch kernel-default-4.12.14-lp150.12.48.1.x86_64 cer@Telcontar:~> rpm -qa | grep -i nvidia nvidia-computeG03-340.107-lp150.12.2.x86_64 nvidia-uvm-gfxG03-kmp-default-340.107_k4.12.14_lp150.11-lp150.12.1.x86_64 x11-video-nvidiaG03-340.107-lp150.12.2.x86_64 nvidia-glG03-340.107-lp150.12.2.x86_64 nvidia-gfxG03-kmp-default-340.107_k4.12.14_lp150.11-lp150.12.2.x86_64 cer@Telcontar:~> If you wish, I can try to reproduce the error and post the thousand of lines from rpm output... But IMHO, it is pointless. Ok. Not sure, who creates /usr/src/linux-obj/x86_64/default It should point to the latest kernel-sources I believe. In your case this would be /usr/src/linux-4.12.14-lp150.12.48 or alike. Maybe you can remove kernel-devel-4.12.14-lp150.12.45.1.noarch kernel-source-4.12.14-lp150.12.45.1.noarch kernel-syms-4.12.14-lp150.12.48.1.x86_64 (you no longer have such a kernel on your system anyway) and reinstall kernel-default-4.12.14-lp150.12.48.1.x86_64 kernel-default-devel-4.12.14-lp150.12.48.1.x86_64 kernel-devel-4.12.14-lp150.12.48.1.noarch kernel-docs-4.12.14-lp150.12.48.1.noarch kernel-macros-4.12.14-lp150.12.48.1.noarch kernel-source-4.12.14-lp150.12.48.1.noarch so the missing symlink gets created. Updating kernel, then downgrading again unfortunately doesn't work reliably when (re-)building the kernel module. Especially if such a symlink is pointing to nowhere ... Ok. Yes, the trick seems to be to reinstall the kernel links However, the problem I report is that the "nvidia-gfxG03-kmp-default-340.107_k4.12.14_lp150.11-lp150.12.1.x86_64.rpm" doesn't say "I failed to build the module, installation failed". Look. With the rpms I listed above (#3) I intentionally break the link: Telcontar:/usr/src/linux-obj/x86_64 # l default lrwxrwxrwx 1 root root 50 Apr 1 19:02 default -> ../../linux-4.12.14-lp151.12.48-obj/x86_64/default (I edited 150 to 151 in the target name) The target does not exist. Now, I try to install the nvidia-gfxG03-kmp-default rpm. Look: [paste begin] Telcontar:/data/... # rpm --install --force nvidia-gfxG03-kmp-default-340.107_k4.12.14_lp150.11-lp150.12.2.x86_64.rpm make: *** /usr/src/linux-obj/x86_64/default: No such file or directory. Stop. make: *** /usr/src/linux-obj/x86_64/default: No such file or directory. Stop. /usr/src/kernel-modules/nvidia-340.107-default / NVIDIA: calling KBUILD... make[1]: *** /lib/modules//source: No such file or directory. Stop. NVIDIA: left KBUILD. nvidia.ko failed to build! make: *** [Makefile:192: nvidia.ko] Error 1 / install: cannot stat '/usr/src/kernel-modules/nvidia-340.107-default/nvidia.ko': No such file or directory depmod: WARNING: could not open modules.order at /lib/modules/4.12.14-lp150.11-default: No such file or directory depmod: WARNING: could not open modules.builtin at /lib/modules/4.12.14-lp150.11-default: No such file or directory Modprobe blacklist files have been created at /etc/modprobe.d to prevent Nouveau from loading. This can be reverted by deleting /etc/modprobe.d/nvidia-*.conf. *** Reboot your computer and verify that the NVIDIA graphics driver can be loaded. *** [paste pause] Notice the message that starts with: "reboot". It is claiming success. But see above: "nvidia.ko failed to build!" Make failed with "Error 1". THAT is the problem I'm reporting. Not why it failed, but that it fails and *claims success*. rpm then calls dracut to create initrd - why, when there is no module? [paste continues] *** Reboot your computer and verify that the NVIDIA graphics driver can be loaded. *** Creating initrd: /boot/initrd-4.12.14-lp150.12.48-default dracut: Executing: /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force --force-drivers "pata_jmicron ata_piix ata_generic netconsole xennet xenblk" /boot/initrd-4.12.14-lp150.12.48-default 4.12.14-lp150.12.48-default dracut: *** Including module: bash *** dracut: *** Including module: systemd *** dracut: *** Including module: warpclock *** dracut: *** Including module: systemd-initrd *** dracut: *** Including module: i18n *** dracut: *** Including module: kernel-modules *** dracut: *** Including module: resume *** dracut: *** Including module: rootfs-block *** dracut: *** Including module: suse-xfs *** dracut: *** Including module: terminfo *** dracut: *** Including module: udev-rules *** dracut: Skipping udev rule: 40-redhat.rules dracut: Skipping udev rule: 50-firmware.rules dracut: Skipping udev rule: 50-udev.rules dracut: Skipping udev rule: 91-permissions.rules dracut: Skipping udev rule: 80-drivers-modprobe.rules dracut: *** Including module: dracut-systemd *** dracut: *** Including module: haveged *** dracut: *** Including module: usrmount *** dracut: *** Including module: base *** dracut: *** Including module: fs-lib *** dracut: *** Including module: shutdown *** dracut: *** Including module: suse *** dracut: *** Including modules done *** dracut: *** Installing kernel module dependencies and firmware *** dracut: *** Installing kernel module dependencies and firmware done *** dracut: *** Resolving executable dependencies *** dracut: *** Resolving executable dependencies done*** dracut: *** Hardlinking files *** dracut: *** Hardlinking files done *** dracut: *** Stripping files *** dracut: *** Stripping files done *** dracut: *** Generating early-microcode cpio image *** dracut: *** Constructing GenuineIntel.bin **** dracut: *** Store current command line parameters *** dracut: Stored kernel commandline: dracut: rd.driver.pre=pata_jmicron rd.driver.pre=ata_piix rd.driver.pre=ata_generic rd.driver.pre=netconsole rd.driver.pre=xennet rd.driver.pre=xenblk dracut: resume=UUID=4feaa6f5-38c4-4674-ae54-8e22389731a1 dracut: root=UUID=ac173013-18ad-4c4e-921e-fd2ecfb56495 rootfstype=ext4 rootflags=rw,relatime,lazytime,data=ordered dracut: *** Creating image file '/boot/initrd-4.12.14-lp150.12.48-default' *** dracut: *** Creating initramfs image file '/boot/initrd-4.12.14-lp150.12.48-default' done *** Telcontar:/data/... # [paste ends] It is not displayed here, but when this is done from YaST, *YaST* says *nothing* about the failure to build the kernel module. The first news that something is wrong is when video fails. Well, we're using the build structure from NVIDIA ... What would be the benefit of letting the build fail officially? The build is done in %post of package install, so the package is already installed and can't be reverted automatically. IIRC the build even gives you errors, if the build succeeds! I don't plan to reimplement NVIDIA's driver build. Seriously. I said nothing about the build structure from Nvida head quarters. I said nothing about reverting the package install. The Make process errors are not reported to the user inside yast. Yast says that the package succeeded, and proceed to reboot. Surely you can add a message telling yast that there were problems with make. Write the log somewhere and tell. I only want YaST to tell the user that something happened. This situation is not acceptable. (In reply to Carlos Robinson from comment #7) > The Make process errors are not reported to the user inside yast. > > Yast says that the package succeeded, and proceed to reboot. Surely you can > add a message telling yast that there were problems with make. Write the log > somewhere and tell. > > I only want YaST to tell the user that something happened. No, this has never been possible. This would be a feature request for YaST/zypper. Ok. Could you please open a bug or feature request against YaST, so it shows the output of appropriate scripts, when any of these exit with an error code != 0? So I can close this one as duplicate ... Hmm. Still interested? Yes, sorry. Just trying to find sometime in which I can boot the machine to do it (repeat the rpm run and risk reboot to obtain the text and error code). Just for curiosity sake, my intention is to get, when I get the money, new hardware, AMD: not Intel, nor Nvidia. I have 3 unsolvable problems with my current hardware, and this is one. Hmm. I was talking about opening a feature request against YaST... (comment #9) If may add a comment: Apart from error messages, it would also be nice if the kernel module were built correctly. That's for my use case anyway. Thanks anyway for the analysis above. With this information I was finally able to build the modules after doing cd /usr/src/ && ln -s linux-4.12.14-lp151.28.7-obj linux-obj (In reply to Stefan Dirsch from comment #12) > Hmm. I was talking about opening a feature request against YaST... (comment > #9) Ok. Looks like Carlos is not (any longer). Bug 1140563 submitted. It took an hour just to extract sample logs. Now I have to reboot the machine so that it is consistent. Thanks! (In reply to Stefan Dirsch from comment #9) > Ok. Could you please open a bug or feature request against YaST, so it shows > the output of appropriate scripts, when any of these exit with an error code > != 0? So I can close this one as duplicate ... Looks like this has already been implemented according to boo#1140563. So reopening this bugreport. nvidia-gfxG05.changes ------------------------------------------------------------------- Mon Jul 8 14:04:20 UTC 2019 - Stefan Dirsch <sndirsch@suse.com> - kmp-post.sh/kmp-trigger.sh * exit with error code 1 from %post/%trigger, if kernel module build/install fails (boo#1131028) Changed this for nvidia-gfx{,G01,G02,G03,G04,G05}. Will be in place with the next update on the nvidia server. Sources (changes) are in https://build.opensuse.org/project/show/X11:Drivers:Video if you're interested. Closing as fixed ... Created attachment 809702 [details]
var/log/zypp/history section
I attach the section of "/var/log/zypp/history" where it logs failure of the script, when finding '/usr/src/kernel-modules/nvidia-340.107-default/nvidia.ko' is missing.
I do not know how to see if:
107 - ZYPPER_EXIT_INF_RPM_SCRIPT_FAILED
is set. But YaST did not display that to me, that's certain.
I may try to reproduce the failure on 15.1, if that is of interest.
The text is visible when using zypper, but as there are thousands of lines flowing, it is impossible to know that it happened unless the admin is looking at the text attentively.
(In reply to Stefan Dirsch from comment #18) ... > if you're interested. Closing as fixed ... Yes, I'll test that when I notice the update on servers, thanks. (In reply to Carlos Robinson from comment #20) > (In reply to Stefan Dirsch from comment #18) > ... > > if you're interested. Closing as fixed ... > > Yes, I'll test that when I notice the update on servers, thanks. Sure. Go ahead! :-) RPM updates for G05 430.34 driver will already include this change. Should be available shortly ... I'd like to test this, but my card needs the G03 driver, and these have not been updated in the NVidia repository since mid June (maybe mid May) -- nor has the G05, anyway: http://http.download.nvidia.com/opensuse/leap/15.1/x86_64/ nvidia-glG03-340.107-lp151.12.2.x86_64.rpm 28MB 2019-06-12 18:07 nvidia-glG04-390.116-lp151.7.1.x86_64.rpm 28MB 2019-06-12 18:07 nvidia-glG05-430.26-lp151.14.1.x86_64.rpm 26MB 2019-06-12 18:07 I'm not on any hurry, just mentioning in case you wonder why I don't report back ;-) Packages are already in the pipeline. They just need to be signed by Nvidia. Unfortunately this time this can't be done before sometime in August. Usually this only takes a few days at the maximum. This time we have bad luck. |