|
Bugzilla – Full Text Bug Listing |
| Summary: | Compute capabilities of NVIDIA drivers cannot be initialised by non-root users | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Matthias Bach <marix> |
| Component: | X11 3rd Party Driver | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED FIXED | QA Contact: | Stefan Dirsch <sndirsch> |
| Severity: | Normal | ||
| Priority: | P2 - High | CC: | bwiedemann, cornelis, ddadap, engineering, kaykaykay123, marix, thechode |
| Version: | Leap 15.2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
NVIDIA Modprobe file
Result of nvidia-bug-report.sh nvidia-uvm-tools.tar sample files from two machines Zypper output for force-reinstall of NVIDIA driver Systemd Unit providing a workaround |
||
|
Description
Matthias Bach
2020-07-05 16:07:56 UTC
Hmm. nvidia-uvm should be loaded automatically once nvidia module gets loaded. How does /etc/modprobe.d/50-nvidia-default.conf look like? Please attach also result of nvidia-bug-report.sh run. Created attachment 839360 [details]
NVIDIA Modprobe file
Created attachment 839361 [details]
Result of nvidia-bug-report.sh
I attached the files you asked for. Please be aware that only loading nvidia-uvm is not sufficient. The proper creation of the corresponding device files is also essential. Looks good. I'm wondering which displaymanager you're using? gdm, sddm, lightdm, ... ? Which desktop? GNOME, KDE, xfce, ... ? (In reply to Matthias Bach from comment #4) > I attached the files you asked for. Please be aware that only loading > nvidia-uvm is not sufficient. The proper creation of the corresponding > device files is also essential. /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf makes sure that device nodes are created during boot and permissions are set when user logs in. Things are complicated. (boo#1000625) I am using sddm as the display manager. Desktop is KDE. But that should be irrelevant as the compute capabilities should are, if everything works correctly, also accessible without and desktop running. Thanks. Please read comments #36 - #43 of boo#1000625 for the background of creating devices and the permissions. So I did a fresh Leap 15.2 installation on my NVIDIA laptop. Worked fine for me. nvidia modules are loaded during boot including nvidia-uvm, permissions are set for the logged in user. # lsmod | grep nvidia nvidia_drm 53248 0 nvidia_modeset 1118208 1 nvidia_drm nvidia_uvm 1069056 0 nvidia 20721664 5 nvidia_uvm,nvidia_modeset ipmi_msghandler 69632 2 ipmi_devintf,nvidia drm_kms_helper 229376 2 nvidia_drm,i915 drm 544768 8 drm_kms_helper,nvidia_drm,i915 # getfacl /dev/nvidia* getfacl: Removing leading '/' from absolute path names # file: dev/nvidia0 # owner: root # group: video user::rw- user:tux:rw- group::rw- mask::rw- other::--- # file: dev/nvidiactl # owner: root # group: video user::rw- user:tux:rw- group::rw- mask::rw- other::--- # file: dev/nvidia-modeset # owner: root # group: video user::rw- user:tux:rw- group::rw- mask::rw- other::--- # file: dev/nvidia-uvm # owner: root # group: video user::rw- user:tux:rw- group::rw- mask::rw- other::--- (check the usr:tux:rw- lines) I noticed that the NVIDIA repository is not yet in the community repos. I will report this. Also I noticed, that the hardware supplements in G04/G05 are wrong, so G04 was autoselected instead of G05. Fixed this now. The fix will be available with the next NVIDIA RPM/repo update. It's embarassing, since this is broken for Leap already since 15.1. :-( Got another report, which convinced me to reopen this one. ;-) https://www.reddit.com/r/openSUSE/comments/hm9dan/cuda_issues_in_leap_152_with_workaround/?utm_medium=android_app&utm_source=share > crw-rw-rw- 1 root root 238, 1 Jul 7 14:35 /dev/nvidia-uvm-tools
Indeed we're not creating this one and actually it's the first time I hear about the existence and requirement for this to make use of uvm module.
I will fix this in our modprobe file. (In reply to Stefan Dirsch from comment #12) > I will fix this in our modprobe file. And also in %post and %trigger, so permissions are set accordingly by udev/logind when user logs in. Done. Now I'm getting in addition to /dev/nvidia-uvm # ls -l /dev/nvidia* [..] crw-rw----+ 1 root video 238, 1 Jul 7 12:16 /dev/nvidia-uvm-tools # getfacl /dev/nvidia* [...] # file: dev/nvidia-uvm-tools # owner: root # group: video user::rw- user:tux:rw- group::rw- mask::rw- other::--- Hope this fixes now the issue. Unfortunately I don't have the skills to install the tools (CUDA and/or others), which needs this device. Thanks for picking up on this again! You don't actually need any special installations to test this. The CUDA runtime is part of the driver. Only compiling applications yourself would require installing the SDK (which the user in the thread you linked did). You can easily test this via the `clinfo` application which ships with openSUSE. On a properly set up system the output should state like the following: Number of platforms 1 Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 10.2.185 Platform Profile FULL_PROFILE While for me until I ran something as root it will look like this: Number of platforms 0 Thanks. Looks like clinfo works for me. I will attach a tarball, that you can test. Created attachment 839469 [details]
nvidia-uvm-tools.tar
Tarball to be extracted in / Needs a reboot afterwards.
I'm assuming this fixes the issue. If not, please don't hesitate to reopen! Fixed packages should be available until the end of this week. Thanks, sadly it seems the nvidia-uvm-tools.tar tarball makes things worse for me instead of better. With that applied SDDM will only give me a black screen with a mouse pointer. Yet, I still get: > ls /dev/nv* /dev/nvidia0 /dev/nvidiactl /dev/nvidia-modeset /dev/nvram > lsmod | grep nvidia nvidia_drm 53248 2 nvidia_modeset 1118208 3 nvidia_drm nvidia 20721664 66 nvidia_modeset ipmi_msghandler 69632 1 nvidia drm_kms_helper 229376 1 nvidia_drm drm 544768 5 drm_kms_helper,nvidia_drm And to be honest, I find this really confusing, as the scripts obviously shoudl load the modules and create the files… Shower thought I just had: It looks to me like the modprobe config file isn't executed on my system. SDDM launches as root, so in that case the Nvidia drivers utilise `nvidia-modprobe` to create the files and load the modules. At least that could explain why I do have the modules and files required for graphics but not those required for compute. If that's true, the obvious question would be why the modprobe config file is ignored. Oh. I forgot, that this is also needed (will be included in %post script of updated package) mkdir -p /run/udev/static_node-tags/uaccess ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset But this doesn't explain, why nvidia-uvm isn't loaded and the device not being created. It should be done via the modprobe scriptlet. *** Bug 1173862 has been marked as a duplicate of this bug. *** I just re-verified. Sadly, even with the links created as the %postin script would, SDDM black-screens when the tarball is applied. Hmm. You rebooted your machine afterwards, right? You could try running the script code in modprobe file manually for testing. I'm running out of ideas why things are not working for you. Created attachment 839480 [details]
sample files from two machines
sample files from two machines, one working one not
I'm the user from the reddit thread that has seen this issue with CUDA 10.1 SDK. My experience of it seems different to others in this chat so far.
My test machine was Leap 15.1, and I did a dup upgrade to 15.2. Some differences on the hardware (NVIDIA GTX instead of an RTX), but otherwise same environment. But this machine works perfectly fine with CUDA.
My primary desktop (with the RTX card) I clean installed 15.2 on, and encounter this issue.
So I have two near identical machines in the same condition. Both were built using ansible playbooks (except for the base install, I haven't dove into AutoYAST yet), so shouldn't be major differences in build. The test machine would have gone through several driver version upgrades for NVIDIA G05 as well is the only main difference that I can theorise.
I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0
Also, my test machine (working) is lacking /dev/nvidia-uvm-tools, but CUDA still functions fine:
getfacl /dev/nvidia*
getfacl: Removing leading '/' from absolute path names
# file: dev/nvidia0
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---
# file: dev/nvidiactl
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---
# file: dev/nvidia-modeset
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---
# file: dev/nvidia-uvm
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---
I've attached files from both machines above, including an additional file present on the test machine - /usr/lib/tmpfiles.d/nvidia-login-acl-trick.conf (appears to be all duplicate, but included nonetheless)
In the meantime, my dirty workaround has been adding this to the root user crontab:
@reboot nvidia-modprobe -u -c=0
And my relatively simple ansible tasklist for installation, if you wanted to recreate my environments:
- name: remove opensource nvidia driver
zypper:
name: xf86-video-nouveau
state: absent
- name: blacklist opensource nvidia driver from install
command: zypper addlock xf86-video-nouveau
args:
warn: false
- name: NVIDIA - add repository
zypper_repository:
name: NVIDIA
repo: 'https://download.nvidia.com/opensuse/leap/15.2'
auto_import_keys: yes
state: present
runrefresh: yes
- name: NVIDIA - install driver
zypper:
name: 'x11-video-nvidiaG05'
state: present
- name: Transfer CUDA package
copy:
src: ~/ansible/packages/cuda-repo-opensuse15-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
dest: /tmp/
- name: add NVIDIA CUDA signing key repository key
rpm_key:
state: present
key: https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/7fa2af80.pub
- name: install CUDA rpm
zypper:
name: /tmp/cuda-repo-opensuse15-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
state: present
- name: install CUDA toolkit
zypper:
name: cuda-toolkit-10-1
state: present
> I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf: > > L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0 Thanks! Good catch! This is definitely needed and may explain the black screen Matthias sees now with SDDM. > Also, my test machine (working) is lacking /dev/nvidia-uvm-tools, but CUDA still functions fine: Looks like it depends, which functionality you require whether you need this device or not. I figured out that it already exists since driver version 364 (March 2016). It's weird to see not getting reports earlier and now with Leap 15.2 several at about the same day even! (In reply to Mister Pend from comment #25) > Created attachment 839480 [details] > sample files from two machines > > sample files from two machines, one working one not You're using different NVreg_DeviceFileGID on your machines (not using our default of 33). Also on the broken machine /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely. # diff -u -r prodmachine\ \(broken\)/ testmachine\ \(works\)/ diff -u -r "prodmachine (broken)/etc/modprobe.d/50-nvidia-default.conf" "testmachine (works)/etc/modprobe.d/50-nvidia-default.conf" --- "prodmachine (broken)/etc/modprobe.d/50-nvidia-default.conf" 2020-07-08 11:22:44.000000000 +0200 +++ "testmachine (works)/etc/modprobe.d/50-nvidia-default.conf" 2020-07-08 11:23:42.000000000 +0200 @@ -1,2 +1,2 @@ -options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=483 NVreg_DeviceFileMode=0660 +options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=484 NVreg_DeviceFileMode=0660 install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then if /sbin/modprobe nvidia_uvm; then if [ ! -c /dev/nvidia-uvm ]; then mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if [ "$device" == "nvidia-uvm" ]; then echo $major; break; fi ; done) 0; chown :video /dev/nvidia-uvm; fi; fi; if [ ! -c /dev/nvidiactl ]; then mknod -m 660 /dev/nvidiactl c 195 255; chown :video /dev/nvidiactl; fi; devid=-1; for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor); if [ "$vendorid" == "0x10de" ]; then class=$(cat $dev/class); classid=${class%%00}; if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then devid=$((devid+1)); if [ ! -c /dev/nvidia${devid} ]; then mknod -m 660 /dev/nvidia${devid} c 195 ${devid}; chown :video /dev/nvidia${devid}; fi; fi; fi; done; /sbin/modprobe nvidia_drm; if [ ! -c /dev/nvidia-modeset ]; then mknod -m 660 /dev/nvidia-modeset c 195 254; chown :video /dev/nvidia-modeset; fi; fi \ No newline at end of file Only in testmachine (works)/usr/lib/tmpfiles.d: nvidia-logind-acl-trick.conf > getfacl /dev/nvidia*
> [...]
@Mister Pend Seeems only gdm has access to your nvidia devices? It should look different once a regular user has logged into the session.
(In reply to Stefan Dirsch from comment #28) > You're using different NVreg_DeviceFileGID on your machines (not using our > default of 33). Also on the broken machine > /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely. I'm not sure how, I'm not doing anything out of the ordinary here - drivers were installed from NVIDIA repository as per my ansible tasklist (NVIDIA repository added 'https://download.nvidia.com/opensuse/leap/15.2', package 'x11-video-nvidiaG05' installed via zypper). And yes, that file is missing completely on a clean installed machine - I suspect it's presence on the working machine is due to driver package variances during it's life before I tested the upgrade on it. (In reply to Stefan Dirsch from comment #29) > > getfacl /dev/nvidia* > > [...] > @Mister Pend Seeems only gdm has access to your nvidia devices? It should > look different once a regular user has logged into the session. Correct, I had SSH'ed into the test machine cause I was too lazy to set up VNC or walk across to the other end of my workshop :P once a regular user has logged on, they show as having access as well (In reply to Mister Pend from comment #31) > Correct, I had SSH'ed into the test machine cause I was too lazy to set up > VNC or walk across to the other end of my workshop :P once a regular user > has logged on, they show as having access as well That's fine then! :-) (In reply to Mister Pend from comment #30) > (In reply to Stefan Dirsch from comment #28) > > You're using different NVreg_DeviceFileGID on your machines (not using our > > default of 33). Also on the broken machine > > /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely. > > I'm not sure how, I'm not doing anything out of the ordinary here - drivers > were installed from NVIDIA repository as per my ansible tasklist (NVIDIA > repository added 'https://download.nvidia.com/opensuse/leap/15.2', package > 'x11-video-nvidiaG05' installed via zypper). And yes, that file is missing > completely on a clean installed machine - I suspect it's presence on the > working machine is due to driver package variances during it's life before I > tested the upgrade on it. Then there is something fishy. You must have edited manually /etc/modprobe.d/50-nvidia-default.conf in order to have a different NVreg_DeviceFileGID there. 33 is the group ID of video group. That's why we use it here. Probably it no longer matters since permissions are meanwhile set via udev/logind (ACLs). So I guess you can ignore this. /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf is created in %post of nvidia-gfxG05-kmp-default and only removed in %postun when uninstalled, not during an update. JFYI, permission handling (done in %post of KMP) # Create symlinks for udev so these devices will get user ACLs by logind later (bnc#1000625) mkdir -p /run/udev/static_node-tags/uaccess mkdir -p /usr/lib/tmpfiles.d ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset cat > /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf << EOF L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset EOF devid=-1 for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor) if [ "$vendorid" == "0x10de" ]; then class=$(cat $dev/class) classid=${class%00} if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then devid=$((devid+1)) ln -snf /dev/nvidia${devid} /run/udev/static_node-tags/uaccess/nvidia${devid} echo "L /run/udev/static_node-tags/uaccess/nvidia${devid} - - - - /dev/nvidia${devid}" >> /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf fi fi done So is this issue solved or not? My wild shot: GDM starts with root privileges, SDDM starts with ordinary user privileges. Maybe I am wrong. (In reply to Nikolai Nikolaevskii from comment #35) > So is this issue solved or not? Honestly I can't say. > My wild shot: GDM starts with root privileges, SDDM starts with ordinary > user privileges. > Maybe I am wrong. I'm afraid you are. AFAIK sddm chooser runs as root, but then gets replaced by the user Xsession running as regular user, so with autologin enabled it may look like X not working from beginnning when permissions to /dev/nvidia0 are not available. It's similar with gdm, which chooser is being run as gdm user and then starts a second Xserver running the Xsession under regular user. (In reply to Stefan Dirsch from comment #27) > > I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf: > > > > L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0 > > Thanks! Good catch! This is definitely needed and may explain the black > screen Matthias sees now with SDDM. I can confirm that adding this line into the file will resolve the black-screen issue. Though I am still without compute capablities. (In reply to Stefan Dirsch from comment #24) > Hmm. You rebooted your machine afterwards, right? You could try running the > script code in modprobe file manually for testing. I'm running out of ideas > why things are not working for you. Yes, I rebooted my machine. I have extracted the code from the install section of the modprobe file and running this via /bin/sh (which on my system means Bash) will create the missing files. root@eddie:~ # ls -l /dev/nvidia* crw-rw----+ 1 root video 195, 254 Jul 8 18:26 /dev/nvidia-modeset crw-rw---- 1 root video 239, 0 Jul 8 18:29 /dev/nvidia-uvm crw-rw---- 1 root video 239, 1 Jul 8 18:29 /dev/nvidia-uvm-tools crw-rw----+ 1 root video 195, 0 Jul 8 18:26 /dev/nvidia0 crw-rw----+ 1 root video 195, 255 Jul 8 18:26 /dev/nvidiactl In consequence, it seems like the modprobe file for some reason is not properly applied by machine despite being present. Could this be an issue of initalisation order? Could it be that some trigger condition for the modprobe is not being matched. Although it wound wonder me if those changed since 15.1. (In reply to Stefan Dirsch from comment #29) > > getfacl /dev/nvidia* > > [...] > @Mister Pend Seeems only gdm has access to your nvidia devices? It should > look different once a regular user has logged into the session. Just to make this explicit, it's completely valid to utilise the compute capabilities (nvenc, CUDA, OpenCL) without a running X session, i.e. for a dedicated machine-learning host that for noise reasons you don't want to have right next to your desk. It's one of the big advantages of the NVIDIA cards over AMD that you can have a truly headless system with them. The first generation of Tesla cards didn't even have graphics outlets, and in consequence wouldn't work on Windows (which I assume is the only reason why a lot of supercomputers now could run giant display farms). I have build rev 71 of X11:Drivers:Video, i.e. before the version update to 45o.57, which, if I am correct contains all corrections mentioned in this report. I installed the packages. The bug is not solved for me. I observe the following (and I use for with ffmpeg and nvenc) As a regular user I get the error: [h264_nvenc @ 0x55de3a62d5c0] Cannot init CUDA When I execute the same command with sudo, it works. After that it also works as regular user. It seems that with executing the ffmpeg command as su, the nvidia-uvm device is made. Before the sudo it did not exist. I have only /dev/nvidia0 /dev/nvidiactl /dev/nvidia-modeset After executing the ffmpeg with the nvenc option as root two additional devices are added: /dev/nvidia-uvm /dev/nvidia-uvm-tools I have checked this once after a reboot. To be more complete: After starting I get the following: ls -l /dev/nvidia* crw-rw----+ 1 root video 195, 0 8 jul 23:43 /dev/nvidia0 crw-rw----+ 1 root video 195, 255 8 jul 23:43 /dev/nvidiactl crw-rw----+ 1 root video 195, 254 8 jul 23:43 /dev/nvidia-modeset After sudo ffmepg ... -c:v h264_nvenc ... I have: ls -l /dev/nvidia* crw-rw----+ 1 root video 195, 0 8 jul 23:43 /dev/nvidia0 crw-rw----+ 1 root video 195, 255 8 jul 23:43 /dev/nvidiactl crw-rw----+ 1 root video 195, 254 8 jul 23:43 /dev/nvidia-modeset crw-rw-rw- 1 root root 240, 0 8 jul 23:44 /dev/nvidia-uvm crw-rw-rw- 1 root root 240, 1 8 jul 23:44 /dev/nvidia-uvm-tools I don't know if the group "root" vs "video" matters. (In reply to Matthias Bach from comment #37) > (In reply to Stefan Dirsch from comment #27) > > > I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf: > > > > > > L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0 > > > > Thanks! Good catch! This is definitely needed and may explain the black > > screen Matthias sees now with SDDM. > > I can confirm that adding this line into the file will resolve the > black-screen issue. Though I am still without compute capablities. Thanks. At least this we could fix again. > I have extracted the code from the install section of the modprobe file and > running this via /bin/sh (which on my system means Bash) will create the > missing files. > > root@eddie:~ # ls -l /dev/nvidia* > crw-rw----+ 1 root video 195, 254 Jul 8 18:26 /dev/nvidia-modeset > crw-rw---- 1 root video 239, 0 Jul 8 18:29 /dev/nvidia-uvm > crw-rw---- 1 root video 239, 1 Jul 8 18:29 /dev/nvidia-uvm-tools > crw-rw----+ 1 root video 195, 0 Jul 8 18:26 /dev/nvidia0 > crw-rw----+ 1 root video 195, 255 Jul 8 18:26 /dev/nvidiactl Ah. Thanks for checking this1 > In consequence, it seems like the modprobe file for some reason is not > properly applied by machine despite being present. Could this be an issue of > initalisation order? Could it be that some trigger condition for the > modprobe is not being matched. Although it wound wonder me if those changed > since 15.1. That could be a good catch! The modprobe file is marked as %config, so possibly the old one has been backed up as .rpmsave, but preferred over the new one nevertheless. This would explain the behaviou at least. Could you check this and remove the old modprobe file. And test again? (In reply to Matthias Bach from comment #38) > (In reply to Stefan Dirsch from comment #29) > > > getfacl /dev/nvidia* > > > [...] > > @Mister Pend Seeems only gdm has access to your nvidia devices? It should > > look different once a regular user has logged into the session. > > Just to make this explicit, it's completely valid to utilise the compute > capabilities (nvenc, CUDA, OpenCL) without a running X session, i.e. for a > dedicated machine-learning host that for noise reasons you don't want to > have right next to your desk. It's one of the big advantages of the NVIDIA > cards over AMD that you can have a truly headless system with them. The > first generation of Tesla cards didn't even have graphics outlets, and in > consequence wouldn't work on Windows (which I assume is the only reason why > a lot of supercomputers now could run giant display farms). Sure, but we don;t cover this use case. We can only set permissions when user logs into a Xsession. > I have build rev 71 of X11:Drivers:Video, i.e. before the version update to 45o.57, which, if I am correct contains all corrections > mentioned in this report. Yes, that's perfect! Thanks for doing this! Unfortunately I cannot provide RPMs for testing for legal reasons here. Could you check if you have two modprobe files like /etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/50-nvidia-default.conf.rpmsave See my comment#41. If yes, please remove the older file and try again (reboot is the easiest). (In reply to Stefan Dirsch from comment #43) > Could you check if you have two modprobe files like > > /etc/modprobe.d/50-nvidia-default.conf > /etc/modprobe.d/50-nvidia-default.conf.rpmsave > > See my comment#41. If yes, please remove the older file and try again > (reboot is the easiest). I only have the following: # ls /etc/modprobe.d/*nvidia* /etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/nvidia-default.conf I do remember removing some rpmsave file at some point during my debugging but my last rounds of tests definitely already were performed without that file present. Damn. That would have been a good explanation ... Ok. According to the modprobe.d manual page only .conf files below /etc/modprobe.d are considered ... indeed on my system I also found a backup file ... but my system is working anyway and loaded the nvidia-uvm module and created the /dev/nvidia-uvm-tools device from the beginning once I adjusted the code. (In reply to Stefan Dirsch from comment #33) > (In reply to Mister Pend from comment #30) > Then there is something fishy. > > You must have edited manually /etc/modprobe.d/50-nvidia-default.conf in > order to have a different > NVreg_DeviceFileGID there. 33 is the group ID of video group. That's why we > use it here. Probably it no longer matters since permissions are meanwhile > set via udev/logind (ACLs). So I guess you can ignore this. Just out of curiosity, I clean installed my test machine (formerly working), and now seeing the same issues on that machine. And looking at /etc/modprobe.d/50-nvidia-default.conf, the NVreg_DeviceFileGID is still 483. Installed from NVIDIA's repository. I can promise you 100% I haven't manually edited this file. Resulting nvidia/cuda packages: rpm -qa | grep -i nvidia nvidia-gfxG05-kmp-default-440.100_k5.3.18_lp152.19-lp152.26.1.x86_64 nvidia-computeG05-440.100-lp152.26.1.x86_64 x11-video-nvidiaG05-440.100-lp152.26.1.x86_64 rpm -qa | grep -i cuda cuda-nvprune-10-1-10.1.105-1.x86_64 cuda-curand-10-1-10.1.105-1.x86_64 cuda-visual-tools-10-1-10.1.105-1.x86_64 cuda-sanitizer-api-10-1-10.1.105-1.x86_64 cuda-driver-dev-10-1-10.1.105-1.x86_64 cuda-gdb-10-1-10.1.105-1.x86_64 cuda-compiler-10-1-10.1.105-1.x86_64 cuda-nsight-systems-10-1-10.1.105-1.x86_64 cuda-nvjpeg-dev-10-1-10.1.105-1.x86_64 cuda-tools-10-1-10.1.105-1.x86_64 cuda-nvjpeg-10-1-10.1.105-1.x86_64 cuda-cudart-10-1-10.1.105-1.x86_64 cuda-command-line-tools-10-1-10.1.105-1.x86_64 cuda-nvgraph-10-1-10.1.105-1.x86_64 cuda-gpu-library-advisor-10-1-10.1.105-1.x86_64 cuda-curand-dev-10-1-10.1.105-1.x86_64 cuda-nvcc-10-1-10.1.105-1.x86_64 cuda-nvprof-10-1-10.1.105-1.x86_64 cuda-npp-10-1-10.1.105-1.x86_64 cuda-cuobjdump-10-1-10.1.105-1.x86_64 cuda-npp-dev-10-1-10.1.105-1.x86_64 cuda-libraries-dev-10-1-10.1.105-1.x86_64 cuda-toolkit-10-1-10.1.105-1.x86_64 cuda-repo-opensuse15-10-1-local-10.1.105-418.39-1.0-1.x86_64 cuda-nvml-dev-10-1-10.1.105-1.x86_64 cuda-misc-headers-10-1-10.1.105-1.x86_64 cuda-cufft-10-1-10.1.105-1.x86_64 cuda-cusparse-dev-10-1-10.1.105-1.x86_64 cuda-cupti-10-1-10.1.105-1.x86_64 cuda-nvrtc-10-1-10.1.105-1.x86_64 cuda-nsight-compute-10-1-10.1.105-1.x86_64 cuda-cusolver-10-1-10.1.105-1.x86_64 cuda-nvgraph-dev-10-1-10.1.105-1.x86_64 cuda-cudart-dev-10-1-10.1.105-1.x86_64 cuda-samples-10-1-10.1.105-1.x86_64 cuda-nsight-10-1-10.1.105-1.x86_64 cuda-nvvp-10-1-10.1.105-1.x86_64 cuda-documentation-10-1-10.1.105-1.x86_64 cuda-nvdisasm-10-1-10.1.105-1.x86_64 cuda-nvrtc-dev-10-1-10.1.105-1.x86_64 cuda-nvtx-10-1-10.1.105-1.x86_64 cuda-cusparse-10-1-10.1.105-1.x86_64 cuda-cufft-dev-10-1-10.1.105-1.x86_64 cuda-license-10-1-10.1.105-1.x86_64 cuda-memcheck-10-1-10.1.105-1.x86_64 cuda-cusolver-dev-10-1-10.1.105-1.x86_64 If there are further tests needed, I'm happy to help. My test machine I can rebuild in under 30 minutes, so happy for potentially destructive tests too. (In reply to Stefan Dirsch from comment #43) > > Could you check if you have two modprobe files like > > /etc/modprobe.d/50-nvidia-default.conf > /etc/modprobe.d/50-nvidia-default.conf.rpmsave > > See my comment#41. If yes, please remove the older file and try again > (reboot is the easiest). Yes, I had both. Removed the .rpmsave one and rebooted. It made not difference. I have backuped both files and can provide them, if necessary. (In reply to Mister Pend from comment #47) > Just out of curiosity, I clean installed my test machine (formerly working), > and now seeing the same issues on that machine. And looking at > /etc/modprobe.d/50-nvidia-default.conf, the NVreg_DeviceFileGID is still > 483. Installed from NVIDIA's repository. I can promise you 100% I haven't > manually edited this file. Resulting nvidia/cuda packages: Thanks for double checking! I have no explanation for this. :-( But as I said it should not matter as long as you don't want to add all users, who should have access to the nvidia devices to the group with this GID. > If there are further tests needed, I'm happy to help. My test machine I can > rebuild in under 30 minutes, so happy for potentially destructive tests too. Thanks a lot for you cooperation! Very much appreciated. At the moment I don't have anything for further testing. (In reply to Cor Blom from comment #48) > > Could you check if you have two modprobe files like > > > > /etc/modprobe.d/50-nvidia-default.conf > > /etc/modprobe.d/50-nvidia-default.conf.rpmsave > > > > See my comment#41. If yes, please remove the older file and try again > > (reboot is the easiest). > > Yes, I had both. Removed the .rpmsave one and rebooted. It made not > difference. > > I have backuped both files and can provide them, if necessary. Thanks for verification. No, it's not necessary. I have run another test that gave me an, at least in my eyes, interesting result: If unload the nvidia module, obviously with display manager and other services stopped, a subsequent explicit `/sbin/modprobe nvidia` will work as expected. That is, `nvidia-uvm` is loaded along and all files are created properly. Based on the previous test I have been able to solve the problem for myself. The solution is as obvious as hidden in plain sight. All I had to do was execute the following: /sbin/mkinitrd Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this. Maybe Mister Pend or Cor Blom can confirm this behaviour. Thanks for investigation. But that would mean, that the nvidia module would also be added to the initrd, which wasn't the case at least on my system. Check with
sudo lsinitrd /boot/initrd | grep nvidia
> Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this.
This would have happened if I could provide a real KMP package to you ... on the other side Cor Blom tested a real KMP package.
It seems the nvidia module is indeed added to the initrd. ✗ sudo lsinitrd /boot/initrd | grep nvidia [sudo] Passwort für root: -rw-r--r-- 1 root root 1483 Jul 7 11:46 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jun 25 17:23 etc/modprobe.d/nvidia-default.conf drwxr-xr-x 12 root root 0 Jul 10 20:26 lib/firmware/nvidia drwxr-xr-x 4 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm200 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm200/acr -rw-r--r-- 1 root root 832 Mar 1 08:55 lib/firmware/nvidia/gm200/acr/bl.bin -rw-r--r-- 1 root root 10144 Mar 1 08:55 lib/firmware/nvidia/gm200/acr/ucode_load.bin -rw-r--r-- 1 root root 1440 Mar 1 08:55 lib/firmware/nvidia/gm200/acr/ucode_unload.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm200/gr -rw-r--r-- 1 root root 576 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 1968 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/fecs_data.bin -rw-r--r-- 1 root root 16271 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/fecs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/fecs_sig.bin -rw-r--r-- 1 root root 576 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2056 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_data.bin -rw-r--r-- 1 root root 9768 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_sig.bin -rw-r--r-- 1 root root 7616 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/sw_bundle_init.bin -rw-r--r-- 1 root root 5592 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/sw_ctx.bin -rw-r--r-- 1 root root 10800 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/sw_method_init.bin -rw-r--r-- 1 root root 1440 Mar 1 08:55 lib/firmware/nvidia/gm200/gr/sw_nonctx.bin drwxr-xr-x 4 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm204 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm204/acr lrwxrwxrwx 1 root root 22 Jul 10 20:26 lib/firmware/nvidia/gm204/acr/bl.bin -> ../../gm200/acr/bl.bin lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gm204/acr/ucode_load.bin -> ../../gm200/acr/ucode_load.bin lrwxrwxrwx 1 root root 32 Jul 10 20:26 lib/firmware/nvidia/gm204/acr/ucode_unload.bin -> ../../gm200/acr/ucode_unload.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm204/gr lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 1968 Mar 1 08:55 lib/firmware/nvidia/gm204/gr/fecs_data.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/fecs_inst.bin -> ../../gm200/gr/fecs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gm204/gr/fecs_sig.bin lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2056 Mar 1 08:55 lib/firmware/nvidia/gm204/gr/gpccs_data.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/gpccs_inst.bin -> ../../gm200/gr/gpccs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gm204/gr/gpccs_sig.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_bundle_init.bin -> ../../gm200/gr/sw_bundle_init.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_ctx.bin -> ../../gm200/gr/sw_ctx.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_method_init.bin -> ../../gm200/gr/sw_method_init.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_nonctx.bin -> ../../gm200/gr/sw_nonctx.bin drwxr-xr-x 4 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm206 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm206/acr lrwxrwxrwx 1 root root 22 Jul 10 20:26 lib/firmware/nvidia/gm206/acr/bl.bin -> ../../gm200/acr/bl.bin -rw-r--r-- 1 root root 10144 Mar 1 08:55 lib/firmware/nvidia/gm206/acr/ucode_load.bin -rw-r--r-- 1 root root 1440 Mar 1 08:55 lib/firmware/nvidia/gm206/acr/ucode_unload.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gm206/gr lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 1968 Mar 1 08:55 lib/firmware/nvidia/gm206/gr/fecs_data.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/fecs_inst.bin -> ../../gm200/gr/fecs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gm206/gr/fecs_sig.bin lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2056 Mar 1 08:55 lib/firmware/nvidia/gm206/gr/gpccs_data.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/gpccs_inst.bin -> ../../gm200/gr/gpccs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gm206/gr/gpccs_sig.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_bundle_init.bin -> ../../gm200/gr/sw_bundle_init.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_ctx.bin -> ../../gm200/gr/sw_ctx.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_method_init.bin -> ../../gm200/gr/sw_method_init.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_nonctx.bin -> ../../gm200/gr/sw_nonctx.bin drwxr-xr-x 4 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp100 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp100/acr -rw-r--r-- 1 root root 832 Mar 1 08:55 lib/firmware/nvidia/gp100/acr/bl.bin -rw-r--r-- 1 root root 9632 Mar 1 08:55 lib/firmware/nvidia/gp100/acr/ucode_load.bin -rw-r--r-- 1 root root 1440 Mar 1 08:55 lib/firmware/nvidia/gp100/acr/ucode_unload.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp100/gr lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp100/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 2028 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/fecs_data.bin -rw-r--r-- 1 root root 20955 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/fecs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/fecs_sig.bin lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp100/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2080 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/gpccs_data.bin -rw-r--r-- 1 root root 12458 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/gpccs_inst.bin -rw-r--r-- 1 root root 76 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/gpccs_sig.bin -rw-r--r-- 1 root root 7664 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/sw_bundle_init.bin -rw-r--r-- 1 root root 6240 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/sw_ctx.bin -rw-r--r-- 1 root root 11928 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/sw_method_init.bin -rw-r--r-- 1 root root 2248 Mar 1 08:55 lib/firmware/nvidia/gp100/gr/sw_nonctx.bin drwxr-xr-x 6 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp102 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp102/acr -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp102/acr/bl.bin -rw-r--r-- 1 root root 17152 Mar 1 08:55 lib/firmware/nvidia/gp102/acr/ucode_load.bin -rw-r--r-- 1 root root 3328 Mar 1 08:55 lib/firmware/nvidia/gp102/acr/ucode_unload.bin -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp102/acr/unload_bl.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp102/gr lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp102/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 2256 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/fecs_data.bin -rw-r--r-- 1 root root 20927 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/fecs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/fecs_sig.bin lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp102/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 1832 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/gpccs_data.bin -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/gpccs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/gpccs_sig.bin -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/sw_bundle_init.bin -rw-r--r-- 1 root root 6216 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/sw_ctx.bin -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/sw_method_init.bin -rw-r--r-- 1 root root 2496 Mar 1 08:55 lib/firmware/nvidia/gp102/gr/sw_nonctx.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp102/nvdec -rw-r--r-- 1 root root 3840 Mar 1 08:55 lib/firmware/nvidia/gp102/nvdec/scrubber.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp102/sec2 -rw-r--r-- 1 root root 656 Mar 1 08:55 lib/firmware/nvidia/gp102/sec2/desc-1.bin -rw-r--r-- 1 root root 656 Mar 1 08:55 lib/firmware/nvidia/gp102/sec2/desc.bin -rw-r--r-- 1 root root 109568 Mar 1 08:55 lib/firmware/nvidia/gp102/sec2/image-1.bin -rw-r--r-- 1 root root 99072 Mar 1 08:55 lib/firmware/nvidia/gp102/sec2/image.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp102/sec2/sig-1.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp102/sec2/sig.bin drwxr-xr-x 6 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp104 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp104/acr lrwxrwxrwx 1 root root 22 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/bl.bin -> ../../gp102/acr/bl.bin lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin lrwxrwxrwx 1 root root 32 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp104/gr lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 2576 Mar 1 08:55 lib/firmware/nvidia/gp104/gr/fecs_data.bin -rw-r--r-- 1 root root 22760 Mar 1 08:55 lib/firmware/nvidia/gp104/gr/fecs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp104/gr/fecs_sig.bin lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 1832 Mar 1 08:55 lib/firmware/nvidia/gp104/gr/gpccs_data.bin -rw-r--r-- 2 root root 13307 Mar 1 08:55 lib/firmware/nvidia/gp104/gr/gpccs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp104/gr/gpccs_sig.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_bundle_init.bin -> ../../gp102/gr/sw_bundle_init.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_ctx.bin -> ../../gp102/gr/sw_ctx.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_method_init.bin -> ../../gp102/gr/sw_method_init.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_nonctx.bin -> ../../gp102/gr/sw_nonctx.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp104/nvdec lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp104/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2 lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/desc-1.bin -> ../../gp102/sec2/desc-1.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/desc.bin -> ../../gp102/sec2/desc.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/image-1.bin -> ../../gp102/sec2/image-1.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/image.bin -> ../../gp102/sec2/image.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/sig-1.bin -> ../../gp102/sec2/sig-1.bin lrwxrwxrwx 1 root root 24 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/sig.bin -> ../../gp102/sec2/sig.bin drwxr-xr-x 6 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp106 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp106/acr lrwxrwxrwx 1 root root 22 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/bl.bin -> ../../gp102/acr/bl.bin lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin lrwxrwxrwx 1 root root 32 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp106/gr lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin -rw-r--r-- 1 root root 2256 Mar 1 08:55 lib/firmware/nvidia/gp106/gr/fecs_data.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/fecs_inst.bin -> ../../gp102/gr/fecs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp106/gr/fecs_sig.bin lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin -rw-r--r-- 1 root root 1832 Mar 1 08:55 lib/firmware/nvidia/gp106/gr/gpccs_data.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/gpccs_inst.bin -> ../../gp102/gr/gpccs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp106/gr/gpccs_sig.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_bundle_init.bin -> ../../gp102/gr/sw_bundle_init.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_ctx.bin -> ../../gp102/gr/sw_ctx.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_method_init.bin -> ../../gp102/gr/sw_method_init.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_nonctx.bin -> ../../gp102/gr/sw_nonctx.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp106/nvdec lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp106/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2 lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/desc-1.bin -> ../../gp102/sec2/desc-1.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/desc.bin -> ../../gp102/sec2/desc.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/image-1.bin -> ../../gp102/sec2/image-1.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/image.bin -> ../../gp102/sec2/image.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/sig-1.bin -> ../../gp102/sec2/sig-1.bin lrwxrwxrwx 1 root root 24 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/sig.bin -> ../../gp102/sec2/sig.bin drwxr-xr-x 6 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp107 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp107/acr lrwxrwxrwx 1 root root 22 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/bl.bin -> ../../gp102/acr/bl.bin lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin lrwxrwxrwx 1 root root 32 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp107/gr -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/fecs_bl.bin -rw-r--r-- 1 root root 2756 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/fecs_data.bin -rw-r--r-- 1 root root 22879 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/fecs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/fecs_sig.bin -rw-r--r-- 3 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2100 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_data.bin -rw-r--r-- 1 root root 12587 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_sig.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gp107/gr/sw_bundle_init.bin -> ../../gp102/gr/sw_bundle_init.bin -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/sw_ctx.bin lrwxrwxrwx 1 root root 33 Jul 10 20:26 lib/firmware/nvidia/gp107/gr/sw_method_init.bin -> ../../gp102/gr/sw_method_init.bin -rw-r--r-- 2 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp107/gr/sw_nonctx.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp107/nvdec lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp107/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2 lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/desc-1.bin -> ../../gp102/sec2/desc-1.bin lrwxrwxrwx 1 root root 25 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/desc.bin -> ../../gp102/sec2/desc.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/image-1.bin -> ../../gp102/sec2/image-1.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/image.bin -> ../../gp102/sec2/image.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/sig-1.bin -> ../../gp102/sec2/sig-1.bin lrwxrwxrwx 1 root root 24 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/sig.bin -> ../../gp102/sec2/sig.bin drwxr-xr-x 6 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp108 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp108/acr lrwxrwxrwx 1 root root 22 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/bl.bin -> ../../gp102/acr/bl.bin lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin lrwxrwxrwx 1 root root 32 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin lrwxrwxrwx 1 root root 29 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp108/gr -rw-r--r-- 2 root root 576 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/fecs_bl.bin -rw-r--r-- 1 root root 2248 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/fecs_data.bin -rw-r--r-- 1 root root 21161 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/fecs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/fecs_sig.bin -rw-r--r-- 3 root root 0 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2092 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_data.bin -rw-r--r-- 1 root root 13095 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_sig.bin -rw-r--r-- 2 root root 7680 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/sw_bundle_init.bin -rw-r--r-- 2 root root 6000 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/sw_ctx.bin -rw-r--r-- 2 root root 12288 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/sw_method_init.bin -rw-r--r-- 2 root root 2496 Mar 1 08:55 lib/firmware/nvidia/gp108/gr/sw_nonctx.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp108/nvdec lrwxrwxrwx 1 root root 30 Jul 10 20:26 lib/firmware/nvidia/gp108/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2 lrwxrwxrwx 1 root root 27 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2/desc.bin -> ../../gp102/sec2/desc-1.bin lrwxrwxrwx 1 root root 28 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2/image.bin -> ../../gp102/sec2/image-1.bin lrwxrwxrwx 1 root root 26 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2/sig.bin -> ../../gp102/sec2/sig-1.bin drwxr-xr-x 6 root root 0 Jul 10 20:26 lib/firmware/nvidia/gv100 drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gv100/acr -rw-r--r-- 2 root root 1280 Mar 1 08:55 lib/firmware/nvidia/gv100/acr/bl.bin -rw-r--r-- 1 root root 18688 Mar 1 08:55 lib/firmware/nvidia/gv100/acr/ucode_load.bin -rw-r--r-- 1 root root 6400 Mar 1 08:55 lib/firmware/nvidia/gv100/acr/ucode_unload.bin -rw-r--r-- 2 root root 1280 Mar 1 08:55 lib/firmware/nvidia/gv100/acr/unload_bl.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gv100/gr -rw-r--r-- 1 root root 576 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/fecs_bl.bin -rw-r--r-- 1 root root 4788 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/fecs_data.bin -rw-r--r-- 1 root root 25632 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/fecs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/fecs_sig.bin -rw-r--r-- 3 root root 576 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_bl.bin -rw-r--r-- 1 root root 2128 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_data.bin -rw-r--r-- 1 root root 12643 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_inst.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_sig.bin -rw-r--r-- 1 root root 7664 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/sw_bundle_init.bin -rw-r--r-- 1 root root 9756 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/sw_ctx.bin -rw-r--r-- 1 root root 12296 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/sw_method_init.bin -rw-r--r-- 1 root root 2728 Mar 1 08:55 lib/firmware/nvidia/gv100/gr/sw_nonctx.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gv100/nvdec -rw-r--r-- 1 root root 4352 Mar 1 08:55 lib/firmware/nvidia/gv100/nvdec/scrubber.bin drwxr-xr-x 2 root root 0 Jul 10 20:26 lib/firmware/nvidia/gv100/sec2 -rw-r--r-- 1 root root 656 Mar 1 08:55 lib/firmware/nvidia/gv100/sec2/desc.bin -rw-r--r-- 1 root root 91136 Mar 1 08:55 lib/firmware/nvidia/gv100/sec2/image.bin -rw-r--r-- 1 root root 192 Mar 1 08:55 lib/firmware/nvidia/gv100/sec2/sig.bin -rw-r--r-- 1 root root 108296 Jul 8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27961744 Jul 8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia.ko -rw-r--r-- 1 root root 1486256 Jul 8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 1879016 Jul 8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia-uvm.ko Yeah. That explains it. The old modprobe config was still being used when nvidia modules were already loaded in initrd ... Building my own package is not that difficult. Stephan has provided excellent instruction in X11:Driver:Video etc project. mkinitrd does not change anything for me. Based of what Matthias said I experimented a bit and got the following. When I unload all nvidia modules (rmmod nvidia-drm, rmmod nvidia-modeset and rmmod nvidia) and after that I run "modprobe nvidia", the result is that I do have "/dev/nvidia-uvm(-tools)", as well as the others.. This is way beyond what I understand, but maybe it helps. Thanks. Unfortunately I don't get this either. :-( Created attachment 839606 [details]
Zypper output for force-reinstall of NVIDIA driver
As it honestly confused me that I thought I'd always seen the driver install perform an initrd build but the problem looks a lot like it didn't, I ran a force reinstall of the driver packages. As you can see from the attached zypper output (I shortened irrelevant parts like the EULA), it does invoke an initrd build in the %posttrans script.
(In reply to Matthias Bach from comment #52) > Based on the previous test I have been able to solve the problem for myself. > The solution is as obvious as hidden in plain sight. All I had to do was > execute the following: > > /sbin/mkinitrd > > Still, I don't fully understand _why_ this solves the problem. I always > assumed the driver installation to trigger this. > > Maybe Mister Pend or Cor Blom can confirm this behaviour. Sorry, I've tried on separate systems and can't confirm this. Executing mkinitrd doesn't error, but doesn't solve the issue. Even after a further reboot. (In reply to Mister Pend from comment #59) > (In reply to Matthias Bach from comment #52) > > Based on the previous test I have been able to solve the problem for myself. > > The solution is as obvious as hidden in plain sight. All I had to do was > > execute the following: > > > > /sbin/mkinitrd > > > > Still, I don't fully understand _why_ this solves the problem. I always > > assumed the driver installation to trigger this. > > > > Maybe Mister Pend or Cor Blom can confirm this behaviour. > > Sorry, I've tried on separate systems and can't confirm this. Executing > mkinitrd doesn't error, but doesn't solve the issue. Even after a further > reboot. After the last driver update, and having worked around bug 1174204, I know also have the issue again. The /sbin/mkinitrd didn't help this time, so it must have been something else that fixed it for me back than. Still, I have that weird effect that unloading the modules after boot will actually make the system work as expected afterwards, until the next reboot. Guys. I'm sorry but at some point I need to give up. I just can't reproduce. It simply works for me with all package versions I tried. I need to close as worksforme. I'm afraid you need to live with your workaround for now. :-( Actually I fixed this issue by creating the nvidia-uvm-tools device node now with the latest packages, so closing as fixed. I fully understand you close this as fixed. Thanks for the work. Now let me tell you something strange. I removed the workaround, rebooted, and it worked: ls -l /dev/nvidia* crw-rw----+ 1 root video 195, 0 16 jul 2020 /dev/nvidia0 crw-rw----+ 1 root video 195, 255 16 jul 2020 /dev/nvidiactl crw-rw----+ 1 root video 195, 254 16 jul 2020 /dev/nvidia-modeset crw-rw----+ 1 root video 241, 0 16 jul 2020 /dev/nvidia-uvm crw-rw----+ 1 root video 241, 1 16 jul 2020 /dev/nvidia-uvm-tools Then I did the kernel update of today (to 26.2) and it stopped working. Then I rebooted to the older kernel (20.7) and it worked again. Created attachment 839797 [details]
Systemd Unit providing a workaround
Thanks for all the effort you put into this Stephan! Given that you cannot reproduce it I fully understand and support your decision. In fact, if my understanding is true that your systemd does not have the NVIDIA driver in the initrd and those running into the issue have, than I suspect this is more related to initrd building than the driver itself anyhow.
For anybody still running into this: I am now using the nvidia-modprobe.service files to implement an automated workaround. Just put this in /etc/systemd/system and enable it. I included dependencies for all services in openSUSE I know to be using the NVIDIA GPU. If you have further services requiring it, add them to the WantedBy and Before lines.
BTW, I just received and accepted a pull request against suse-prime, which might be related. OTOH I think it's only needed if you already use TW (or systemd of TW). https://github.com/openSUSE/SUSEPrime/pull/56 But it might make a difference for you ... One user reported that forcing dracut to include nvidia-uvm into initrd made it work for him https://www.reddit.com/r/openSUSE/comments/hszckh/no_cuda_support_with_repo_drivers/fym023h Ok. That would mean that dracut would add other nvidia modules to initrd, which doesn't happen to me. But maybe that's related to the fact, that I'm testing here on an Optimus system with Intel being the primary GPU, so automatic module selection could look different. Check which nvidia modules are in your initrd by running # lsinitrd | grep nvidia | grep -v firmware If nvidia modules are included but nvidia-uvm, you could 1) either try adding all of them to initrd # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF install_items+=" /usr/bin/chmod /usr/bin/mknod /usr/bin/cat /usr/bin/echo /usr/bin/chown " add_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF 2) or make sure they are not added at all # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF Oh. You need to run 'mkinitrd' after adding a dracut config file so the changes get active and reboot the machine in order to see the results. Thanks. That workaround works great and is much better than the one I've been using so far.
It seems like, at least on my system, nvidia-uvm is even included but the symlink for the weak updates is missing:
➜ sudo lsinitrd | grep nvidia | grep -v firmware
[sudo] Passwort für root:
-rw-r--r-- 1 root root 1483 Jul 17 09:31 etc/modprobe.d/50-nvidia-default.conf
-rw-r--r-- 1 root root 18 Jul 16 21:43 etc/modprobe.d/nvidia-default.conf
-rw-r--r-- 1 root root 5335848 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
-rw-r--r-- 1 root root 38338488 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
-rw-r--r-- 1 root root 2183784 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
-rw-r--r-- 1 root root 42426104 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko
lrwxrwxrwx 1 root root 54 Jul 19 19:44 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko
lrwxrwxrwx 1 root root 50 Jul 19 19:44 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko
lrwxrwxrwx 1 root root 58 Jul 19 19:44 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko
vs.
➜ find /lib/modules -name 'nvidia*.ko'
/lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
/lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko
/lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
/lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
/lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia-drm.ko
/lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia-uvm.ko
/lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia-modeset.ko
/lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia.ko
/lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko
/lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-uvm.ko
/lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko
/lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko
With explicitly listing modules to be included in initrd the symlink is finally in the initrd:
➜ ~ sudo lsinitrd | grep nvidia | grep -v firmware
-rw-r--r-- 1 root root 1483 Jul 17 09:31 etc/modprobe.d/50-nvidia-default.conf
-rw-r--r-- 1 root root 18 Jul 16 21:43 etc/modprobe.d/nvidia-default.conf
-rw-r--r-- 1 root root 5335848 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
-rw-r--r-- 1 root root 38338488 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
-rw-r--r-- 1 root root 2183784 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
-rw-r--r-- 1 root root 42426104 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko
lrwxrwxrwx 1 root root 54 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko
lrwxrwxrwx 1 root root 50 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko
lrwxrwxrwx 1 root root 58 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko
lrwxrwxrwx 1 root root 54 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-uvm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-uvm.ko
(In reply to Stefan Dirsch from comment #67) > 2) or make sure they are not added at all > > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF > omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" > EOF Sorry, that was wrong. It should have been # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF Could you please try this as well? I believe this should result in what achieved Cor Blom by comment #56. Sorry to jump in here, please correct me if wrong, am I correct in assuming this bug is preventing my applications requiring CUDA (Davinci Resolve, Blender) failing to run after a DUP from Leap 15.1 to 15.2. I have tried the Systemd Unit providing a workaround suggestion but that failed to work. (In reply to Stefan Dirsch from comment #67) > Check which nvidia modules are in your initrd by running Looks like no nvidia-uvm on a clean installed system: -rw-r--r-- 1 root root 1484 Jul 17 22:49 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 05:43 etc/modprobe.d/nvidia-default.conf -rw-r--r-- 1 root root 119664 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27465704 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko -rw-r--r-- 1 root root 1574168 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 54 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 50 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko lrwxrwxrwx 1 root root 58 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko > 2) or make sure they are not added at all Ran your command in comment #70, followed by a mkinitrd and a reboot. And at this point, blank screen, X doesn't seem to load :( (In reply to steve edmonds from comment #71) > Sorry to jump in here, please correct me if wrong, am I correct in assuming > this bug is preventing my applications requiring CUDA (Davinci Resolve, > Blender) failing to run after a DUP from Leap 15.1 to 15.2. > I have tried the Systemd Unit providing a workaround suggestion but that > failed to work. It would seek likely. A workaround (not a solution, but a workaround) that works for me was adding the following to the root crontab: @reboot nvidia-modprobe -u -c=0 (or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). After this it may start working for you. (In reply to Mister Pend from comment #73) > (In reply to steve edmonds from comment #71) > > Sorry to jump in here, please correct me if wrong, am I correct in assuming > > this bug is preventing my applications requiring CUDA (Davinci Resolve, > > Blender) failing to run after a DUP from Leap 15.1 to 15.2. > > I have tried the Systemd Unit providing a workaround suggestion but that > > failed to work. > > It would seek likely. A workaround (not a solution, but a workaround) that > works for me was adding the following to the root crontab: > > @reboot nvidia-modprobe -u -c=0 > > (or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). > After this it may start working for you. Unfortunately that has not worked for me either.I have not tried modifying the initrd, I am not quite sure which action to take. sudo lsinitrd | grep nvidia | grep -v firmware gives only -rw-r--r-- 1 root root 1483 Jul 20 15:04 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 07:43 etc/modprobe.d/nvidia-default.conf Where as on my functioning Leap 15.1 I have -rw-r--r-- 1 root root 1483 Jul 17 21:10 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 07:42 etc/modprobe.d/nvidia-default.conf -rw-r--r-- 1 root root 116160 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27452392 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia.ko -rw-r--r-- 1 root root 1570992 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 1934696 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-uvm.ko lrwxrwxrwx 1 root root 55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-drm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 51 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia.ko lrwxrwxrwx 1 root root 59 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-modeset.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-uvm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-uvm.ko (In reply to Stefan Dirsch from comment #70) > (In reply to Stefan Dirsch from comment #67) > > 2) or make sure they are not added at all > > > > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF > > omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" > > EOF > > Sorry, that was wrong. It should have been > > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF > omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" > EOF > > Could you please try this as well? I believe this should result in what > achieved Cor Blom by comment #56. I can confirm that this also fixes the issue for me. I like this even better than force-including them into the initrd as this gives a nice reduction in initrd size from 37 MiB to 14 MiB. (In reply to steve edmonds from comment #74) > (In reply to Mister Pend from comment #73) > > (In reply to steve edmonds from comment #71) > > > Sorry to jump in here, please correct me if wrong, am I correct in assuming > > > this bug is preventing my applications requiring CUDA (Davinci Resolve, > > > Blender) failing to run after a DUP from Leap 15.1 to 15.2. > > > I have tried the Systemd Unit providing a workaround suggestion but that > > > failed to work. > > > > It would seek likely. A workaround (not a solution, but a workaround) that > > works for me was adding the following to the root crontab: > > > > @reboot nvidia-modprobe -u -c=0 > > > > (or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). > > After this it may start working for you. > > Unfortunately that has not worked for me either.I have not tried modifying > the initrd, I am not quite sure which action to take. > sudo lsinitrd | grep nvidia | grep -v firmware > gives only > > […] That initrd looks correct to my non-expert eye. Some other things that might be interesting: 1) Output of `lsmod | grep nvidia` 2) Output of `ls -lh /dev/nvidia*` 3) Is you user a member of the group `video`? 4) Output of `clinfo | head -n 5` (In reply to Matthias Bach from comment #76) > > That initrd looks correct to my non-expert eye. Some other things that might > be interesting: > > 1) Output of `lsmod | grep nvidia` > 2) Output of `ls -lh /dev/nvidia*` > 3) Is you user a member of the group `video`? > 4) Output of `clinfo | head -n 5` 1.>lsmod | grep nvidia (nothing) 2.>ls -lh /dev/nvidia* ls: cannot access '/dev/nvidia*': No such file or directory 3. Yes 4. clinfo | head -n 5 Number of platforms 0 The same video card and GO5 driver (450.57) working in Leap 15.1 gives quite different responses. (In reply to steve edmonds from comment #71) > Sorry to jump in here, please correct me if wrong, am I correct in assuming > this bug is preventing my applications requiring CUDA (Davinci Resolve, > Blender) failing to run after a DUP from Leap 15.1 to 15.2. Yes, this sounds reasonable! (In reply to steve edmonds from comment #74) > Unfortunately that has not worked for me either.I have not tried modifying > the initrd, I am not quite sure which action to take. > sudo lsinitrd | grep nvidia | grep -v firmware > gives only > -rw-r--r-- 1 root root 1483 Jul 20 15:04 > etc/modprobe.d/50-nvidia-default.conf > -rw-r--r-- 1 root root 18 Jul 17 07:43 > etc/modprobe.d/nvidia-default.conf That does not need to be an issue. I have the same behaviour on my working system. > Where as on my functioning Leap 15.1 I have > > -rw-r--r-- 1 root root 1483 Jul 17 21:10 > etc/modprobe.d/50-nvidia-default.conf > -rw-r--r-- 1 root root 18 Jul 17 07:42 > etc/modprobe.d/nvidia-default.conf > -rw-r--r-- 1 root root 116160 Jul 17 21:11 > lib/modules/4.12.14-lp151.27-default/updates/nvidia-drm.ko > -rw-r--r-- 1 root root 27452392 Jul 17 21:11 > lib/modules/4.12.14-lp151.27-default/updates/nvidia.ko > -rw-r--r-- 1 root root 1570992 Jul 17 21:11 > lib/modules/4.12.14-lp151.27-default/updates/nvidia-modeset.ko > -rw-r--r-- 1 root root 1934696 Jul 17 21:11 > lib/modules/4.12.14-lp151.27-default/updates/nvidia-uvm.ko > lrwxrwxrwx 1 root root 55 Jul 17 21:11 > lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-drm.ko > -> ../../../4.12.14-lp151.27-default/updates/nvidia-drm.ko > lrwxrwxrwx 1 root root 51 Jul 17 21:11 > lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia.ko -> > ../../../4.12.14-lp151.27-default/updates/nvidia.ko > lrwxrwxrwx 1 root root 59 Jul 17 21:11 > lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-modeset. > ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-modeset.ko > lrwxrwxrwx 1 root root 55 Jul 17 21:11 > lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-uvm.ko > -> ../../../4.12.14-lp151.27-default/updates/nvidia-uvm.ko Yes, this looks consistent. (In reply to steve edmonds from comment #77) > 1.>lsmod | grep nvidia > (nothing) > 2.>ls -lh /dev/nvidia* > ls: cannot access '/dev/nvidia*': No such file or directory > 3. Yes > 4. clinfo | head -n 5 > Number of platforms 0 > > The same video card and GO5 driver (450.57) working in Leap 15.1 gives quite > different responses. OMG. I'm wondering whether you really have nvidia-gfxG05-kmp-default package installed. If yes, what does modprobe nvidia trigger? Check also dmesg output. Also please make sure you have the latest G05 packages installed from our Leap 15.2 repos. (In reply to Mister Pend from comment #72) > (In reply to Stefan Dirsch from comment #67) > > Check which nvidia modules are in your initrd by running > > Looks like no nvidia-uvm on a clean installed system: > > -rw-r--r-- 1 root root 1484 Jul 17 22:49 > etc/modprobe.d/50-nvidia-default.conf > -rw-r--r-- 1 root root 18 Jul 17 05:43 > etc/modprobe.d/nvidia-default.conf > -rw-r--r-- 1 root root 119664 Jul 17 22:49 > lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko > -rw-r--r-- 1 root root 27465704 Jul 17 22:49 > lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko > -rw-r--r-- 1 root root 1574168 Jul 17 22:49 > lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko > lrwxrwxrwx 1 root root 54 Jul 17 22:49 > lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> > ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko > lrwxrwxrwx 1 root root 50 Jul 17 22:49 > lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> > ../../../5.3.18-lp152.19-default/updates/nvidia.ko > lrwxrwxrwx 1 root root 58 Jul 17 22:49 > lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko > -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko Yes, exactly the same issue as Matthias Bach had. > > 2) or make sure they are not added at all > > Ran your command in comment #70, followed by a mkinitrd and a reboot. And at > this point, blank screen, X doesn't seem to load :( My fault, the advice was wrong. Please try instead - as already corrected in my comment #70 # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF # mkinitrd According to Matthias Bach this should work. (In reply to Stefan Dirsch from comment #78) > > OMG. I'm wondering whether you really have nvidia-gfxG05-kmp-default package > installed. If yes, what does > > modprobe nvidia > > trigger? Check also dmesg output. Also please make sure you have the latest > G05 packages installed from our Leap 15.2 repos. > sudo modprobe nvidia (done via ssh as not in front of the 15.2 PC but with a screen locked X11 session running on it) modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg) From Yast i │nvidia-computeG05 │NVIDIA driver for computing with GPGPU i │nvidia-gfxG05-kmp-default│NVIDIA graphics driver kernel module for GeForce 600 series and newer i │nvidia-glG05 │NVIDIA OpenGL libraries for OpenGL acceleration i │x11-video-nvidiaG05 │NVIDIA graphics driver for GeForce 600 series and newer Also, my CAD software balks if I do not have the above drivers loaded. Only reference I found in dmesg is [ 4.599729] audit: type=1400 audit(1595298800.894:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=482 comm="apparmor_parser" [ 4.599732] audit: type=1400 audit(1595298800.894:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=482 comm="apparmor_parser" (In reply to Matthias Bach from comment #75) > (In reply to Stefan Dirsch from comment #70) > > (In reply to Stefan Dirsch from comment #67) > > > 2) or make sure they are not added at all > > > > > > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF > > > omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" > > > EOF > > > > Sorry, that was wrong. It should have been > > > > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF > > omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" > > EOF > > > > Could you please try this as well? I believe this should result in what > > achieved Cor Blom by comment #56. > > I can confirm that this also fixes the issue for me. I like this even better > than force-including them into the initrd as this gives a nice reduction in > initrd size from 37 MiB to 14 MiB. Thanks for feedback! I've implemented this now in our packages. Anyone building the packages themselves from obs://X11:Drivers:Video can test this right now. Hello @Cor Blom, glad to know that at least one person is making use of this service! :-) Closing as fixed. (In reply to steve edmonds from comment #80) > > sudo modprobe nvidia (done via ssh as not in front of the 15.2 PC but with a screen locked X11 session running on it) > modprobe: ERROR: could not find module by name='nvidia' > modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or > unknown parameter (see dmesg) > > From Yast > i │nvidia-computeG05 │NVIDIA driver for computing with GPGPU > i │nvidia-gfxG05-kmp-default│NVIDIA graphics driver kernel module for > GeForce 600 series and newer > i │nvidia-glG05 │NVIDIA OpenGL libraries for OpenGL > acceleration i │x11-video-nvidiaG05 │NVIDIA > graphics driver for GeForce 600 series and newer > > Also, my CAD software balks if I do not have the above drivers loaded. > > Only reference I found in dmesg is > [ 4.599729] audit: type=1400 audit(1595298800.894:4): apparmor="STATUS" > operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=482 > comm="apparmor_parser" > [ 4.599732] audit: type=1400 audit(1595298800.894:5): apparmor="STATUS" > operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" > pid=482 comm="apparmor_parser" Seems you're failing on a complete different level. No nvidia modules installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. If this doesn't help, open a separate bug. It's really unrelated to this one ... I have built the latest version and can confirm it fixes this bug. Thanks. Thanks a lot for positive feedback, @Cor Blom! :-) (In reply to Stefan Dirsch from comment #82) > > Seems you're failing on a complete different level. No nvidia modules > installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. > If this doesn't help, open a separate bug. It's really unrelated to this one > ... Do you think it could be related to this release note; 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed openSUSE Leap 15.2 now enables a kernel module signature check for third-party drivers (CONFIG_MODULE_SIG=y). This is an important security measure to avoid untrusted code running in the kernel. This may prevent third-party kernel modules from being loaded if UEFI Secure Boot is enabled. Importantly, this affects NVIDIA...... Although I can't see why my CAD complains of no openGL if I remove the installed Nvidia packages. (In reply to steve edmonds from comment #85) > (In reply to Stefan Dirsch from comment #82) > > > > Seems you're failing on a complete different level. No nvidia modules > > installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. > > If this doesn't help, open a separate bug. It's really unrelated to this one > > ... > > Do you think it could be related to this release note; > 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed Yes, if you are using secure boot your issues are most likely be caused by this. But that too should be solved with the latest package version. You'll have to manually import the package signing key once, though. See bug 1173682 for details. (In reply to Matthias Bach from comment #86) > (In reply to steve edmonds from comment #85) > > (In reply to Stefan Dirsch from comment #82) > > > > > > Seems you're failing on a complete different level. No nvidia modules > > > installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. > > > If this doesn't help, open a separate bug. It's really unrelated to this one > > > ... > > > > Do you think it could be related to this release note; > > 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed > > Yes, if you are using secure boot your issues are most likely be caused by > this. But that too should be solved with the latest package version. You'll > have to manually import the package signing key once, though. See bug > 1173682 for details. May be secure boot is not my issue, I am booting with GRUB2 without EFI and enable trusted boot support off. Thanks to all. Back in front of the problem machine today I tried the proposed work around; # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF # mkinitrd This worked!. Interestingly though when running mkinitrd I have output as below. Is the first part relating to the kernel 4.4.76-1 supposed to be here or is it a hangover from Leap 15.1. Creating initrd: /boot/initrd-4.4.76-1-default dracut: Executing: /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force --force-drivers "xennet xenblk" /boot/initrd-4.4.76-1-default 4.4.76-1-default . . dracut-install: Failed to find module 'des' dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ySNXfs/initramfs -H -N i2o_scsi|nvidia|nvidia_drm|nvidia-modeset|nvidia-uvm --kerneldir /lib/modules/4.4.76-1-default/ -m des ecb md4 md5 hmac arc4 nls_utf8 dracut: *** Including modules done *** dracut-install: Failed to find module 'xennet' dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ySNXfs/initramfs -N i2o_scsi|nvidia|nvidia_drm|nvidia-modeset|nvidia-uvm --kerneldir /lib/modules/4.4.76-1-default/ -m xennet xenblk dracut: *** Installing kernel module dependencies *** depmod: WARNING: could not open modules.order at /var/tmp/dracut.ySNXfs/initramfs/lib/modules/4.4.76-1-default: No such file or directory depmod: WARNING: could not open modules.builtin at /var/tmp/dracut.ySNXfs/initramfs/lib/modules/4.4.76-1-default: No such file or directory . . before Creating initrd: /boot/initrd-5.3.18-lp152.20.7-default The errors in mkinitrd are not related, it is another bug preventing the purge-kernels service running. (In reply to steve edmonds from comment #88) > Interestingly though when running mkinitrd I have output as below. Is the > first part relating to the kernel 4.4.76-1 supposed to be here or is it a > hangover from Leap 15.1. > > Creating initrd: /boot/initrd-4.4.76-1-default This is expected. We always leave some older kernels around so if the latest one is bad, you can boot into an older kernel. I thought I had this all under control so moved on to DUP the next Leap 15.1 machine to 15.2. After the upgrade the Nvidia driver loads except for nvidia-uvm, so I have openGL support but no CUDA support. If I run clinfo | head -n 5 as normal user, nvidia_uvm doesn't load but if I run the command elevated it does load and stays loaded. (output below) The previous suggestions of cat > /etc/dracut.conf.d/50-nvidia-default.conf.... don't work her because after the dup I have only 1 file there > ls /etc/dracut.conf.d 99-debug.conf I do have /etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/nvidia-default.conf I suspect the @reboot nvidia-modprobe -u -c=0 solution may be my option > lsmod | grep nvidia nvidia_drm 61440 16 nvidia_modeset 1187840 34 nvidia_drm nvidia 19726336 1586 nvidia_modeset drm_kms_helper 229376 1 nvidia_drm drm 544768 19 drm_kms_helper,nvidia_drm > clinfo | head -n 5 Number of platforms 0 steve@linux-qw83:~> lsmod | grep nvidia nvidia_drm 61440 17 nvidia_modeset 1187840 36 nvidia_drm nvidia 19726336 1671 nvidia_modeset drm_kms_helper 229376 1 nvidia_drm drm 544768 20 drm_kms_helper,nvidia_drm > sudo clinfo | head -n 5 [sudo] password for root: Number of platforms 1 Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 11.0.210 Platform Profile FULL_PROFILE > lsmod | grep nvidia nvidia_uvm 1110016 0 nvidia_drm 61440 17 nvidia_modeset 1187840 36 nvidia_drm nvidia 19726336 1672 nvidia_uvm,nvidia_modeset drm_kms_helper 229376 1 nvidia_drm drm 544768 20 drm_kms_helper,nvidia_drm Removed /etc/dracut.conf.d/50-nvidia-default.conf could be explained by having suse-prime package installed (Optimus systems with Intel/NVIDIA GPU combo), but then you would have a /etc/dracut.conf.d/90-nvidia-dracut-G05.conf or /usr/lib/dracut/dracut.conf.d/90-nvidia-dracut-G05.conf installed with the same content and you still shouldn't have any nvidia modules in your initrd. I don't have an explanation for this right now. (In reply to Stefan Dirsch from comment #92) > Removed /etc/dracut.conf.d/50-nvidia-default.conf could be explained by > having suse-prime package installed (Optimus systems with Intel/NVIDIA GPU > combo), but then you would have a > /etc/dracut.conf.d/90-nvidia-dracut-G05.conf or > /usr/lib/dracut/dracut.conf.d/90-nvidia-dracut-G05.conf installed with the > same content and you still shouldn't have any nvidia modules in your initrd. > > I don't have an explanation for this right now. The files were there under Leap 15.1, I am assuming nvidia-computeG05 provides /etc/modprobe.d/50-nvidia-default.conf but I have no idea what process leads to the presence of /etc/dracut.conf.d/50-nvidia-default.conf > The files were there under Leap 15.1, I am assuming nvidia-computeG05 > provides /etc/modprobe.d/50-nvidia-default.conf No, that's part of nvidia-gfxG05-kmp-default packgaes > but I have no idea what > process leads to the presence of /etc/dracut.conf.d/50-nvidia-default.conf That was the temporary workaround, This won't be needed with the next driver package update. With that nvidia-gfxG05-kmp-default will include /etc/dracut.conf.d/60-nvidia-default.conf with the same content, i.e. nvidia modules won't be added any longer to initrd. (In reply to Stefan Dirsch from comment #94) An updated Nvidia driver just installed from the repository and CUDA apps are working as expected now. Indeed. Repos have been updated yesterday! :-) Was this an NVIDIA issue then, rather than OpenSUSE? Asking out of curiosity Well, if one doesn't appreciate that openSUSE takes care about security and therefore doesn't make nvidia-modprobe suid root, you can call it an openSUSE issue ... SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed. Category: feature (moderate) Bug References: 1173733, 1207495, 1207520 Jira References: PED-2658, SLE-24579 Sources used: Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed. Category: feature (moderate) Bug References: 1173733, 1207495, 1207520 Jira References: PED-2658, SLE-24579 Sources used: Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed. Category: feature (moderate) Bug References: 1173733, 1207495, 1207520 Jira References: PED-2658, SLE-24579 Sources used: Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed. Category: feature (moderate) Bug References: 1173733, 1207495, 1207520 Jira References: PED-2658, SLE-24579 Sources used: Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed. Category: feature (moderate) Bug References: 1173733, 1207495, 1207520 Jira References: PED-2658, SLE-24579 Sources used: Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed. Category: feature (moderate) Bug References: 1173733, 1207495, 1207520 Jira References: PED-2658, SLE-24579 Sources used: Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. |