Bug 1173733

Summary: Compute capabilities of NVIDIA drivers cannot be initialised by non-root users
Product: [openSUSE] openSUSE Distribution Reporter: Matthias Bach <marix>
Component: X11 3rd Party DriverAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED FIXED QA Contact: Stefan Dirsch <sndirsch>
Severity: Normal    
Priority: P2 - High CC: bwiedemann, cornelis, ddadap, engineering, kaykaykay123, marix, thechode
Version: Leap 15.2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: NVIDIA Modprobe file
Result of nvidia-bug-report.sh
nvidia-uvm-tools.tar
sample files from two machines
Zypper output for force-reinstall of NVIDIA driver
Systemd Unit providing a workaround

Description Matthias Bach 2020-07-05 16:07:56 UTC
After an upgrade from openSUSE 15.1 to openSUSE 15.2, the compute capabilities of the proprietary NVIDIA drivers can no longer be used by normal users unless root has used them before.

This can be easily verified by running an arbitrary application that invokes `clGetPlatformIDs`. That function will return an error code of -1001. It will also show in Boinc, which reports: No usable GPUs found.

Running the same code as root will succeed, and afterwards also non-priviledged users can fully utilise the GPU.

What I was able to debug is that until root used compute capabilities of the GPU, the `nvidia-uvm` kernel module is not loaded and the device files `/dev/nvidia-uvm` and `/dev/nvidia-uvm-tools` are missing.

Loading the `nvidia-uvm` kernel module on its own is not sufficient. Running a simple GPU-utilising application shows that the application (or rather one of the driver components it invokes), before returning the error, attempts to run `nvidia-modprobe`. 

According to it's help, `nvidia-modprobe` is a "setuid program is used to create, in a Linux distribution-independent way, NVIDIA Linux device files and load the NVIDIA kernel module, on behalf of NVIDIA Linux driver components". However, this application seem to not be installed as a setuid application in Leap 15.2.

Manually making `nvidia-modrpobe` setuid will actually fix the issue, that is, normal users like myself or the boinc user can afterwards successfully utilise the GPU without root having run anything on it.
Comment 1 Stefan Dirsch 2020-07-05 18:18:12 UTC
Hmm. nvidia-uvm should be loaded automatically once nvidia module gets loaded. How does /etc/modprobe.d/50-nvidia-default.conf
look like? Please attach also result of nvidia-bug-report.sh run.
Comment 2 Matthias Bach 2020-07-05 18:22:55 UTC
Created attachment 839360 [details]
NVIDIA Modprobe file
Comment 3 Matthias Bach 2020-07-05 18:28:55 UTC
Created attachment 839361 [details]
Result of nvidia-bug-report.sh
Comment 4 Matthias Bach 2020-07-05 18:30:19 UTC
I attached the files you asked for. Please be aware that only loading nvidia-uvm is not sufficient. The proper creation of the corresponding device files is also essential.
Comment 5 Stefan Dirsch 2020-07-05 18:34:56 UTC
Looks good. I'm wondering which displaymanager you're using? gdm, sddm, lightdm, ... ? Which desktop? GNOME, KDE, xfce, ... ?
Comment 6 Stefan Dirsch 2020-07-05 18:48:18 UTC
(In reply to Matthias Bach from comment #4)
> I attached the files you asked for. Please be aware that only loading
> nvidia-uvm is not sufficient. The proper creation of the corresponding
> device files is also essential.

/usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf

makes sure that device nodes are created during boot and permissions are set when user logs in. Things are complicated. (boo#1000625)
Comment 7 Matthias Bach 2020-07-05 19:28:10 UTC
I am using sddm as the display manager.

Desktop is KDE. But that should be irrelevant as the compute capabilities should are, if everything works correctly, also accessible without and desktop running.
Comment 8 Stefan Dirsch 2020-07-06 06:49:36 UTC
Thanks. Please read comments #36 - #43 of boo#1000625 for the background of creating devices and the permissions.
Comment 9 Stefan Dirsch 2020-07-06 10:32:59 UTC
So I did a fresh Leap 15.2 installation on my NVIDIA laptop. Worked fine for me. nvidia modules are loaded during boot including nvidia-uvm, permissions are set for the logged in user.

# lsmod | grep nvidia
nvidia_drm             53248  0
nvidia_modeset       1118208  1 nvidia_drm
nvidia_uvm           1069056  0
nvidia              20721664  5 nvidia_uvm,nvidia_modeset
ipmi_msghandler        69632  2 ipmi_devintf,nvidia
drm_kms_helper        229376  2 nvidia_drm,i915
drm                   544768  8 drm_kms_helper,nvidia_drm,i915

# getfacl /dev/nvidia*
getfacl: Removing leading '/' from absolute path names
# file: dev/nvidia0
# owner: root
# group: video
user::rw-
user:tux:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidiactl
# owner: root
# group: video
user::rw-
user:tux:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-modeset
# owner: root
# group: video
user::rw-
user:tux:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-uvm
# owner: root
# group: video
user::rw-
user:tux:rw-
group::rw-
mask::rw-
other::---

(check the usr:tux:rw- lines)

I noticed that the NVIDIA repository is not yet in the community repos. I will report this.

Also I noticed, that the hardware supplements in G04/G05 are wrong, so G04 was autoselected instead of G05. Fixed this now. The fix will be available with the next NVIDIA RPM/repo update. It's embarassing, since this is broken for Leap already since 15.1. :-(
Comment 10 Stefan Dirsch 2020-07-07 08:40:32 UTC
Got another report, which convinced me to reopen this one. ;-)

https://www.reddit.com/r/openSUSE/comments/hm9dan/cuda_issues_in_leap_152_with_workaround/?utm_medium=android_app&utm_source=share
Comment 11 Stefan Dirsch 2020-07-07 08:56:09 UTC
> crw-rw-rw-  1 root root  238,   1 Jul  7 14:35 /dev/nvidia-uvm-tools

Indeed we're not creating this one and actually it's the first time I hear about the existence and requirement for this to make use of uvm module.
Comment 12 Stefan Dirsch 2020-07-07 09:01:39 UTC
I will fix this in our modprobe file.
Comment 13 Stefan Dirsch 2020-07-07 09:08:31 UTC
(In reply to Stefan Dirsch from comment #12)
> I will fix this in our modprobe file.

And also in %post and %trigger, so permissions are set accordingly by udev/logind when user logs in.
Comment 14 Stefan Dirsch 2020-07-07 10:54:23 UTC
Done. Now I'm getting in addition to /dev/nvidia-uvm

# ls -l /dev/nvidia*
[..]
crw-rw----+ 1 root video 238,   1 Jul  7 12:16 /dev/nvidia-uvm-tools

# getfacl /dev/nvidia*
[...]
# file: dev/nvidia-uvm-tools
# owner: root
# group: video
user::rw-
user:tux:rw-
group::rw-
mask::rw-
other::---

Hope this fixes now the issue. Unfortunately I don't have the skills to install the tools (CUDA and/or others), which needs this
device.
Comment 15 Matthias Bach 2020-07-07 18:58:58 UTC
Thanks for picking up on this again!

You don't actually need any special installations to test this. The CUDA runtime is part of the driver. Only compiling applications yourself would require installing the SDK (which the user in the thread you linked did).

You can easily test this via the `clinfo` application which ships with openSUSE. On a properly set up system the output should state like the following:

Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 10.2.185
  Platform Profile                                FULL_PROFILE

While for me until I ran something as root it will look like this:

Number of platforms                               0
Comment 16 Stefan Dirsch 2020-07-07 19:17:24 UTC
Thanks. Looks like clinfo works for me. I will attach a tarball, that you can test.
Comment 17 Stefan Dirsch 2020-07-07 19:22:07 UTC
Created attachment 839469 [details]
nvidia-uvm-tools.tar

Tarball to be extracted in / Needs a reboot afterwards.
Comment 18 Stefan Dirsch 2020-07-07 19:24:02 UTC
I'm assuming this fixes the issue. If not, please don't hesitate to reopen! Fixed packages should be available until the end of this week.
Comment 19 Matthias Bach 2020-07-07 21:12:03 UTC
Thanks, sadly it seems the nvidia-uvm-tools.tar tarball makes things worse for me instead of better. With that applied SDDM will only give me a black screen with a mouse pointer. Yet, I still get:

> ls /dev/nv*
/dev/nvidia0
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvram

> lsmod | grep nvidia
nvidia_drm             53248  2
nvidia_modeset       1118208  3 nvidia_drm
nvidia              20721664  66 nvidia_modeset
ipmi_msghandler        69632  1 nvidia
drm_kms_helper        229376  1 nvidia_drm
drm                   544768  5 drm_kms_helper,nvidia_drm

And to be honest, I find this really confusing, as the scripts obviously shoudl load the modules and create the files…
Comment 20 Matthias Bach 2020-07-07 21:44:31 UTC
Shower thought I just had: It looks to me like the modprobe config file isn't executed on my system. SDDM launches as root, so in that case the Nvidia drivers utilise `nvidia-modprobe` to create the files and load the modules. At least that could explain why I do have the modules and files required for graphics but not those required for compute. If that's true, the obvious question would be why the modprobe config file is ignored.
Comment 21 Stefan Dirsch 2020-07-07 23:18:22 UTC
Oh. I forgot, that this is also needed (will be included in %post script of updated package)

mkdir -p /run/udev/static_node-tags/uaccess
ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl 
ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm
ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools
ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset

But this doesn't explain, why nvidia-uvm isn't loaded and the device not being created. It should be done via the modprobe scriptlet.
Comment 22 Stefan Dirsch 2020-07-07 23:24:08 UTC
*** Bug 1173862 has been marked as a duplicate of this bug. ***
Comment 23 Matthias Bach 2020-07-08 06:33:36 UTC
I just re-verified. Sadly, even with the links created as the %postin script would, SDDM black-screens when the tarball is applied.
Comment 24 Stefan Dirsch 2020-07-08 08:29:29 UTC
Hmm. You rebooted your machine afterwards, right? You could try running the script code in modprobe file manually for testing. I'm running out of ideas why things are not working for you.
Comment 25 Mister Pend 2020-07-08 09:36:09 UTC
Created attachment 839480 [details]
sample files from two machines

sample files from two machines, one working one not
Comment 26 Mister Pend 2020-07-08 09:39:45 UTC
I'm the user from the reddit thread that has seen this issue with CUDA 10.1 SDK. My experience of it seems different to others in this chat so far.

My test machine was Leap 15.1, and I did a dup upgrade to 15.2. Some differences on the hardware (NVIDIA GTX instead of an RTX), but otherwise same environment. But this machine works perfectly fine with CUDA.

My primary desktop (with the RTX card) I clean installed 15.2 on, and encounter this issue.

So I have two near identical machines in the same condition. Both were built using ansible playbooks (except for the base install, I haven't dove into AutoYAST yet), so shouldn't be major differences in build. The test machine would have gone through several driver version upgrades for NVIDIA G05 as well is the only main difference that I can theorise.

I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:

L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0

Also, my test machine (working) is lacking /dev/nvidia-uvm-tools, but CUDA still functions fine:

getfacl /dev/nvidia*
getfacl: Removing leading '/' from absolute path names
# file: dev/nvidia0
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidiactl
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-modeset
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-uvm
# owner: root
# group: video
user::rw-
user:gdm:rw-
group::rw-
mask::rw-
other::---

I've attached files from both machines above, including an additional file present on the test machine - /usr/lib/tmpfiles.d/nvidia-login-acl-trick.conf (appears to be all duplicate, but included nonetheless)


In the meantime, my dirty workaround has been adding this to the root user crontab:
@reboot nvidia-modprobe -u -c=0

And my relatively simple ansible tasklist for installation, if you wanted to recreate my environments:

    - name: remove opensource nvidia driver
      zypper:
        name: xf86-video-nouveau
        state: absent
    - name: blacklist opensource nvidia driver from install
      command: zypper addlock xf86-video-nouveau
      args:
        warn: false
    - name: NVIDIA - add repository
      zypper_repository:
        name: NVIDIA
        repo: 'https://download.nvidia.com/opensuse/leap/15.2'
        auto_import_keys: yes
        state: present
        runrefresh: yes
    - name: NVIDIA - install driver
      zypper:
        name: 'x11-video-nvidiaG05'
        state: present
    - name: Transfer CUDA package
      copy:
        src: ~/ansible/packages/cuda-repo-opensuse15-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
        dest: /tmp/
    - name: add NVIDIA CUDA signing key repository key
      rpm_key:
        state: present
        key: https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/7fa2af80.pub
    - name: install CUDA rpm
      zypper:
        name: /tmp/cuda-repo-opensuse15-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
        state: present
    - name: install CUDA toolkit
      zypper:
        name: cuda-toolkit-10-1
        state: present
Comment 27 Stefan Dirsch 2020-07-08 10:08:34 UTC
> I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
> 
> L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0

Thanks! Good catch! This is definitely needed and may explain the black screen Matthias sees now with SDDM.

> Also, my test machine (working) is lacking /dev/nvidia-uvm-tools, but CUDA still functions fine:

Looks like it depends, which functionality you require whether you need this device or not. I figured out that it already exists since driver version 364 (March 2016). It's weird to see not getting reports earlier and now with Leap 15.2 several at about the same day even!
Comment 28 Stefan Dirsch 2020-07-08 10:15:28 UTC
(In reply to Mister Pend from comment #25)
> Created attachment 839480 [details]
> sample files from two machines
> 
> sample files from two machines, one working one not

You're using different NVreg_DeviceFileGID on your machines (not using our default of 33). Also on the broken machine /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely.

# diff -u -r prodmachine\ \(broken\)/ testmachine\ \(works\)/
diff -u -r "prodmachine (broken)/etc/modprobe.d/50-nvidia-default.conf" "testmachine (works)/etc/modprobe.d/50-nvidia-default.conf"
--- "prodmachine (broken)/etc/modprobe.d/50-nvidia-default.conf"        2020-07-08 11:22:44.000000000 +0200
+++ "testmachine (works)/etc/modprobe.d/50-nvidia-default.conf" 2020-07-08 11:23:42.000000000 +0200
@@ -1,2 +1,2 @@
-options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=483 NVreg_DeviceFileMode=0660
+options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=484 NVreg_DeviceFileMode=0660
 install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then   if /sbin/modprobe nvidia_uvm; then     if [ ! -c /dev/nvidia-uvm ]; then       mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if [ "$device" == "nvidia-uvm" ]; then echo $major; break; fi ; done) 0;        chown :video /dev/nvidia-uvm;     fi;   fi;   if [ ! -c /dev/nvidiactl ]; then     mknod -m 660 /dev/nvidiactl c 195 255;     chown :video /dev/nvidiactl;   fi;   devid=-1;   for dev in $(ls -d /sys/bus/pci/devices/*); do      vendorid=$(cat $dev/vendor);     if [ "$vendorid" == "0x10de" ]; then       class=$(cat $dev/class);       classid=${class%%00};       if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then          devid=$((devid+1));         if [ ! -c /dev/nvidia${devid} ]; then            mknod -m 660 /dev/nvidia${devid} c 195 ${devid};            chown :video /dev/nvidia${devid};         fi;       fi;     fi;   done;   /sbin/modprobe nvidia_drm;   if [ ! -c /dev/nvidia-modeset ]; then     mknod -m 660 /dev/nvidia-modeset c 195 254;     chown :video /dev/nvidia-modeset;   fi; fi 
\ No newline at end of file
Only in testmachine (works)/usr/lib/tmpfiles.d: nvidia-logind-acl-trick.conf
Comment 29 Stefan Dirsch 2020-07-08 10:25:44 UTC
> getfacl /dev/nvidia*
> [...]
@Mister Pend Seeems only gdm has access to your nvidia devices? It should look different once a regular user has logged into the session.
Comment 30 Mister Pend 2020-07-08 12:02:44 UTC
(In reply to Stefan Dirsch from comment #28)
> You're using different NVreg_DeviceFileGID on your machines (not using our
> default of 33). Also on the broken machine
> /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely.

I'm not sure how, I'm not doing anything out of the ordinary here - drivers were installed from NVIDIA repository as per my ansible tasklist (NVIDIA repository added 'https://download.nvidia.com/opensuse/leap/15.2', package 'x11-video-nvidiaG05' installed via zypper). And yes, that file is missing completely on a clean installed machine - I suspect it's presence on the working machine is due to driver package variances during it's life before I tested the upgrade on it.
Comment 31 Mister Pend 2020-07-08 12:04:20 UTC
(In reply to Stefan Dirsch from comment #29)
> > getfacl /dev/nvidia*
> > [...]
> @Mister Pend Seeems only gdm has access to your nvidia devices? It should
> look different once a regular user has logged into the session.


Correct, I had SSH'ed into the test machine cause I was too lazy to set up VNC or walk across to the other end of my workshop :P    once a regular user has logged on, they show as having access as well
Comment 32 Stefan Dirsch 2020-07-08 14:20:03 UTC
(In reply to Mister Pend from comment #31)
> Correct, I had SSH'ed into the test machine cause I was too lazy to set up
> VNC or walk across to the other end of my workshop :P    once a regular user
> has logged on, they show as having access as well

That's fine then! :-)
Comment 33 Stefan Dirsch 2020-07-08 14:30:04 UTC
(In reply to Mister Pend from comment #30)
> (In reply to Stefan Dirsch from comment #28)
> > You're using different NVreg_DeviceFileGID on your machines (not using our
> > default of 33). Also on the broken machine
> > /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely.
> 
> I'm not sure how, I'm not doing anything out of the ordinary here - drivers
> were installed from NVIDIA repository as per my ansible tasklist (NVIDIA
> repository added 'https://download.nvidia.com/opensuse/leap/15.2', package
> 'x11-video-nvidiaG05' installed via zypper). And yes, that file is missing
> completely on a clean installed machine - I suspect it's presence on the
> working machine is due to driver package variances during it's life before I
> tested the upgrade on it.

Then there is something fishy. 

You must have edited manually /etc/modprobe.d/50-nvidia-default.conf in order to have a different
NVreg_DeviceFileGID there. 33 is the group ID of video group. That's why we use it here. Probably it no longer matters since permissions are meanwhile set via udev/logind (ACLs). So I guess you can ignore this.

/usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf is created in %post of nvidia-gfxG05-kmp-default  and only removed in %postun when uninstalled, not during an update.
Comment 34 Stefan Dirsch 2020-07-08 14:34:05 UTC
JFYI, permission handling (done in %post of KMP)

# Create symlinks for udev so these devices will get user ACLs by logind later (bnc#1000625)
mkdir -p /run/udev/static_node-tags/uaccess
mkdir -p /usr/lib/tmpfiles.d
ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl 
ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm
ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools
ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset
cat >  /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf << EOF
L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl
L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm
L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools
L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset
EOF
devid=-1
for dev in $(ls -d /sys/bus/pci/devices/*); do 
  vendorid=$(cat $dev/vendor)
  if [ "$vendorid" == "0x10de" ]; then 
    class=$(cat $dev/class)
    classid=${class%00}
    if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then 
      devid=$((devid+1))
      ln -snf /dev/nvidia${devid} /run/udev/static_node-tags/uaccess/nvidia${devid}
      echo "L /run/udev/static_node-tags/uaccess/nvidia${devid} - - - - /dev/nvidia${devid}" >> /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf
    fi
  fi
done
Comment 35 Nikolai Nikolaevskii 2020-07-08 14:42:25 UTC
So is this issue solved or not?

My wild shot: GDM starts with root privileges, SDDM starts with ordinary user privileges. 
Maybe I am wrong.
Comment 36 Stefan Dirsch 2020-07-08 15:39:20 UTC
(In reply to Nikolai Nikolaevskii from comment #35)
> So is this issue solved or not?

Honestly I can't say.

> My wild shot: GDM starts with root privileges, SDDM starts with ordinary
> user privileges. 
> Maybe I am wrong.

I'm afraid you are. AFAIK sddm chooser runs as root, but then gets replaced by the user Xsession running as regular user, so with autologin enabled it may look like X not working from beginnning when permissions to /dev/nvidia0 are not available.

It's similar with gdm, which chooser is being run as gdm user and then starts a second Xserver running the Xsession under regular user.
Comment 37 Matthias Bach 2020-07-08 16:38:01 UTC
(In reply to Stefan Dirsch from comment #27)
> > I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
> > 
> > L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0
> 
> Thanks! Good catch! This is definitely needed and may explain the black
> screen Matthias sees now with SDDM.

I can confirm that adding this line into the file will resolve the black-screen issue. Though I am still without compute capablities.


(In reply to Stefan Dirsch from comment #24)
> Hmm. You rebooted your machine afterwards, right? You could try running the
> script code in modprobe file manually for testing. I'm running out of ideas
> why things are not working for you.

Yes, I rebooted my machine.

I have extracted the code from the install section of the modprobe file and running this via /bin/sh (which on my system means Bash) will create the missing files.

root@eddie:~ # ls -l /dev/nvidia*
crw-rw----+ 1 root video 195, 254 Jul  8 18:26 /dev/nvidia-modeset
crw-rw----  1 root video 239,   0 Jul  8 18:29 /dev/nvidia-uvm
crw-rw----  1 root video 239,   1 Jul  8 18:29 /dev/nvidia-uvm-tools
crw-rw----+ 1 root video 195,   0 Jul  8 18:26 /dev/nvidia0
crw-rw----+ 1 root video 195, 255 Jul  8 18:26 /dev/nvidiactl

In consequence, it seems like the modprobe file for some reason is not properly applied by machine despite being present. Could this be an issue of initalisation order? Could it be that some trigger condition for the modprobe is not being matched. Although it wound wonder me if those changed since 15.1.
Comment 38 Matthias Bach 2020-07-08 16:41:58 UTC
(In reply to Stefan Dirsch from comment #29)
> > getfacl /dev/nvidia*
> > [...]
> @Mister Pend Seeems only gdm has access to your nvidia devices? It should
> look different once a regular user has logged into the session.

Just to make this explicit, it's completely valid to utilise the compute capabilities (nvenc, CUDA, OpenCL) without a running X session, i.e. for a dedicated machine-learning host that for noise reasons you don't want to have right next to your desk. It's one of the big advantages of the NVIDIA cards over AMD that you can have a truly headless system with them. The first generation of Tesla cards didn't even have graphics outlets, and in consequence wouldn't work on Windows (which I assume is the only reason why a lot of supercomputers now could run giant display farms).
Comment 39 Cor Blom 2020-07-08 21:42:20 UTC
I have build rev 71 of X11:Drivers:Video, i.e. before the version update to 45o.57, which, if I am correct contains all corrections mentioned in this report.

I installed the packages. The bug is not solved for me. I observe the following (and I use for with ffmpeg and nvenc)

As a regular user I get the error: [h264_nvenc @ 0x55de3a62d5c0] Cannot init CUDA

When I execute the same command with sudo, it works.

After that it also works as regular user. It seems that with executing the ffmpeg command as su, the nvidia-uvm device is made. Before the sudo it did not exist. I have only /dev/nvidia0  /dev/nvidiactl  /dev/nvidia-modeset

After executing the ffmpeg with the nvenc option as root two additional devices are added:
/dev/nvidia-uvm  /dev/nvidia-uvm-tools

I have checked this once after a reboot.
Comment 40 Cor Blom 2020-07-08 21:48:21 UTC
To be more complete:

After starting I get the following:

ls -l /dev/nvidia*
crw-rw----+ 1 root video 195,   0  8 jul 23:43 /dev/nvidia0
crw-rw----+ 1 root video 195, 255  8 jul 23:43 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254  8 jul 23:43 /dev/nvidia-modeset

After sudo ffmepg ... -c:v h264_nvenc ...  I have:

ls -l /dev/nvidia*
crw-rw----+ 1 root video 195,   0  8 jul 23:43 /dev/nvidia0
crw-rw----+ 1 root video 195, 255  8 jul 23:43 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254  8 jul 23:43 /dev/nvidia-modeset
crw-rw-rw-  1 root root  240,   0  8 jul 23:44 /dev/nvidia-uvm
crw-rw-rw-  1 root root  240,   1  8 jul 23:44 /dev/nvidia-uvm-tools

I don't know if the group "root" vs "video" matters.
Comment 41 Stefan Dirsch 2020-07-08 21:56:29 UTC
(In reply to Matthias Bach from comment #37)
> (In reply to Stefan Dirsch from comment #27)
> > > I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
> > > 
> > > L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0
> > 
> > Thanks! Good catch! This is definitely needed and may explain the black
> > screen Matthias sees now with SDDM.
> 
> I can confirm that adding this line into the file will resolve the
> black-screen issue. Though I am still without compute capablities.

Thanks. At least this we could fix again.

> I have extracted the code from the install section of the modprobe file and
> running this via /bin/sh (which on my system means Bash) will create the
> missing files.
> 
> root@eddie:~ # ls -l /dev/nvidia*
> crw-rw----+ 1 root video 195, 254 Jul  8 18:26 /dev/nvidia-modeset
> crw-rw----  1 root video 239,   0 Jul  8 18:29 /dev/nvidia-uvm
> crw-rw----  1 root video 239,   1 Jul  8 18:29 /dev/nvidia-uvm-tools
> crw-rw----+ 1 root video 195,   0 Jul  8 18:26 /dev/nvidia0
> crw-rw----+ 1 root video 195, 255 Jul  8 18:26 /dev/nvidiactl

Ah. Thanks for checking this1

> In consequence, it seems like the modprobe file for some reason is not
> properly applied by machine despite being present. Could this be an issue of
> initalisation order? Could it be that some trigger condition for the
> modprobe is not being matched. Although it wound wonder me if those changed
> since 15.1.

That could be a good catch! The modprobe file is marked as %config, so possibly the old one has been backed up as .rpmsave, but preferred over the new one nevertheless. This would explain the behaviou at least. Could you check this and remove the old modprobe file. And test again?
Comment 42 Stefan Dirsch 2020-07-08 21:59:49 UTC
(In reply to Matthias Bach from comment #38)
> (In reply to Stefan Dirsch from comment #29)
> > > getfacl /dev/nvidia*
> > > [...]
> > @Mister Pend Seeems only gdm has access to your nvidia devices? It should
> > look different once a regular user has logged into the session.
> 
> Just to make this explicit, it's completely valid to utilise the compute
> capabilities (nvenc, CUDA, OpenCL) without a running X session, i.e. for a
> dedicated machine-learning host that for noise reasons you don't want to
> have right next to your desk. It's one of the big advantages of the NVIDIA
> cards over AMD that you can have a truly headless system with them. The
> first generation of Tesla cards didn't even have graphics outlets, and in
> consequence wouldn't work on Windows (which I assume is the only reason why
> a lot of supercomputers now could run giant display farms).

Sure, but we don;t cover this use case. We can only set permissions when user logs into a Xsession.
Comment 43 Stefan Dirsch 2020-07-08 22:07:02 UTC
> I have build rev 71 of X11:Drivers:Video, i.e. before the version update to 45o.57, which, if I am correct contains all corrections 
> mentioned in this report.

Yes, that's perfect! Thanks for doing this! Unfortunately I cannot provide RPMs for testing for legal reasons here.

Could you check if you have two modprobe files like

/etc/modprobe.d/50-nvidia-default.conf
/etc/modprobe.d/50-nvidia-default.conf.rpmsave

See my comment#41. If yes, please remove the older file and try again (reboot is the easiest).
Comment 44 Matthias Bach 2020-07-08 22:27:51 UTC
(In reply to Stefan Dirsch from comment #43)
> Could you check if you have two modprobe files like
> 
> /etc/modprobe.d/50-nvidia-default.conf
> /etc/modprobe.d/50-nvidia-default.conf.rpmsave
> 
> See my comment#41. If yes, please remove the older file and try again
> (reboot is the easiest).

I only have the following:

# ls /etc/modprobe.d/*nvidia*
/etc/modprobe.d/50-nvidia-default.conf  /etc/modprobe.d/nvidia-default.conf

I do remember removing some rpmsave file at some point during my debugging but my last rounds of tests definitely already were performed without that file present.
Comment 45 Stefan Dirsch 2020-07-08 22:53:49 UTC
Damn. That would have been a good explanation ...
Comment 46 Stefan Dirsch 2020-07-08 23:13:54 UTC
Ok. According to the modprobe.d manual page only .conf files below /etc/modprobe.d are considered ... indeed on my system I also found a backup file ... but my system is working anyway and loaded the nvidia-uvm module and created the /dev/nvidia-uvm-tools device from the beginning once I adjusted the code.
Comment 47 Mister Pend 2020-07-09 04:05:02 UTC
(In reply to Stefan Dirsch from comment #33)
> (In reply to Mister Pend from comment #30)
> Then there is something fishy. 
> 
> You must have edited manually /etc/modprobe.d/50-nvidia-default.conf in
> order to have a different
> NVreg_DeviceFileGID there. 33 is the group ID of video group. That's why we
> use it here. Probably it no longer matters since permissions are meanwhile
> set via udev/logind (ACLs). So I guess you can ignore this.

Just out of curiosity, I clean installed my test machine (formerly working), and now seeing the same issues on that machine. And looking at /etc/modprobe.d/50-nvidia-default.conf, the NVreg_DeviceFileGID is still 483. Installed from NVIDIA's repository. I can promise you 100% I haven't manually edited this file. Resulting nvidia/cuda packages:

rpm -qa | grep -i nvidia
nvidia-gfxG05-kmp-default-440.100_k5.3.18_lp152.19-lp152.26.1.x86_64
nvidia-computeG05-440.100-lp152.26.1.x86_64
x11-video-nvidiaG05-440.100-lp152.26.1.x86_64

rpm -qa | grep -i cuda
cuda-nvprune-10-1-10.1.105-1.x86_64
cuda-curand-10-1-10.1.105-1.x86_64
cuda-visual-tools-10-1-10.1.105-1.x86_64
cuda-sanitizer-api-10-1-10.1.105-1.x86_64
cuda-driver-dev-10-1-10.1.105-1.x86_64
cuda-gdb-10-1-10.1.105-1.x86_64
cuda-compiler-10-1-10.1.105-1.x86_64
cuda-nsight-systems-10-1-10.1.105-1.x86_64
cuda-nvjpeg-dev-10-1-10.1.105-1.x86_64
cuda-tools-10-1-10.1.105-1.x86_64
cuda-nvjpeg-10-1-10.1.105-1.x86_64
cuda-cudart-10-1-10.1.105-1.x86_64
cuda-command-line-tools-10-1-10.1.105-1.x86_64
cuda-nvgraph-10-1-10.1.105-1.x86_64
cuda-gpu-library-advisor-10-1-10.1.105-1.x86_64
cuda-curand-dev-10-1-10.1.105-1.x86_64
cuda-nvcc-10-1-10.1.105-1.x86_64
cuda-nvprof-10-1-10.1.105-1.x86_64
cuda-npp-10-1-10.1.105-1.x86_64
cuda-cuobjdump-10-1-10.1.105-1.x86_64
cuda-npp-dev-10-1-10.1.105-1.x86_64
cuda-libraries-dev-10-1-10.1.105-1.x86_64
cuda-toolkit-10-1-10.1.105-1.x86_64
cuda-repo-opensuse15-10-1-local-10.1.105-418.39-1.0-1.x86_64
cuda-nvml-dev-10-1-10.1.105-1.x86_64
cuda-misc-headers-10-1-10.1.105-1.x86_64
cuda-cufft-10-1-10.1.105-1.x86_64
cuda-cusparse-dev-10-1-10.1.105-1.x86_64
cuda-cupti-10-1-10.1.105-1.x86_64
cuda-nvrtc-10-1-10.1.105-1.x86_64
cuda-nsight-compute-10-1-10.1.105-1.x86_64
cuda-cusolver-10-1-10.1.105-1.x86_64
cuda-nvgraph-dev-10-1-10.1.105-1.x86_64
cuda-cudart-dev-10-1-10.1.105-1.x86_64
cuda-samples-10-1-10.1.105-1.x86_64
cuda-nsight-10-1-10.1.105-1.x86_64
cuda-nvvp-10-1-10.1.105-1.x86_64
cuda-documentation-10-1-10.1.105-1.x86_64
cuda-nvdisasm-10-1-10.1.105-1.x86_64
cuda-nvrtc-dev-10-1-10.1.105-1.x86_64
cuda-nvtx-10-1-10.1.105-1.x86_64
cuda-cusparse-10-1-10.1.105-1.x86_64
cuda-cufft-dev-10-1-10.1.105-1.x86_64
cuda-license-10-1-10.1.105-1.x86_64
cuda-memcheck-10-1-10.1.105-1.x86_64
cuda-cusolver-dev-10-1-10.1.105-1.x86_64

If there are further tests needed, I'm happy to help. My test machine I can rebuild in under 30 minutes, so happy for potentially destructive tests too.
Comment 48 Cor Blom 2020-07-09 07:05:52 UTC
(In reply to Stefan Dirsch from comment #43)
> 
> Could you check if you have two modprobe files like
> 
> /etc/modprobe.d/50-nvidia-default.conf
> /etc/modprobe.d/50-nvidia-default.conf.rpmsave
> 
> See my comment#41. If yes, please remove the older file and try again
> (reboot is the easiest).

Yes, I had both. Removed the .rpmsave one and rebooted. It made not difference.

I have backuped both files and can provide them, if necessary.
Comment 49 Stefan Dirsch 2020-07-09 10:10:21 UTC
(In reply to Mister Pend from comment #47)
> Just out of curiosity, I clean installed my test machine (formerly working),
> and now seeing the same issues on that machine. And looking at
> /etc/modprobe.d/50-nvidia-default.conf, the NVreg_DeviceFileGID is still
> 483. Installed from NVIDIA's repository. I can promise you 100% I haven't
> manually edited this file. Resulting nvidia/cuda packages:

Thanks for double checking! I have no explanation for this. :-( But as I said it should not matter as long as you don't want
to add all users, who should have access to the nvidia devices to the group with this GID.

> If there are further tests needed, I'm happy to help. My test machine I can
> rebuild in under 30 minutes, so happy for potentially destructive tests too.

Thanks a lot for you cooperation! Very much appreciated. At the moment I don't have anything for further testing.
Comment 50 Stefan Dirsch 2020-07-09 10:11:44 UTC
(In reply to Cor Blom from comment #48)

> > Could you check if you have two modprobe files like
> > 
> > /etc/modprobe.d/50-nvidia-default.conf
> > /etc/modprobe.d/50-nvidia-default.conf.rpmsave
> > 
> > See my comment#41. If yes, please remove the older file and try again
> > (reboot is the easiest).
> 
> Yes, I had both. Removed the .rpmsave one and rebooted. It made not
> difference.
> 
> I have backuped both files and can provide them, if necessary.

Thanks for verification. No, it's not necessary.
Comment 51 Matthias Bach 2020-07-10 18:25:55 UTC
I have run another test that gave me an, at least in my eyes, interesting result: If unload the nvidia module, obviously with display manager and other services stopped, a subsequent explicit `/sbin/modprobe nvidia` will work as expected. That is, `nvidia-uvm`  is loaded along and all files are created properly.
Comment 52 Matthias Bach 2020-07-10 18:37:15 UTC
Based on the previous test I have been able to solve the problem for myself. The solution is as obvious as hidden in plain sight. All I had to do was execute the following:

/sbin/mkinitrd

Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this.

Maybe Mister Pend or Cor Blom can confirm this behaviour.
Comment 53 Stefan Dirsch 2020-07-10 19:03:16 UTC
Thanks for investigation. But that would mean, that the nvidia module would also be added to the initrd, which wasn't the case at least on my system. Check with 

  sudo lsinitrd /boot/initrd | grep nvidia

> Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this.

This would have happened if I could provide a real KMP package to you ... on the other side Cor Blom tested a real KMP package.
Comment 54 Matthias Bach 2020-07-10 19:08:51 UTC
It seems the nvidia module is indeed added to the initrd. 

✗  sudo lsinitrd /boot/initrd | grep nvidia
[sudo] Passwort für root: 
-rw-r--r--   1 root     root         1483 Jul  7 11:46 etc/modprobe.d/50-nvidia-default.conf
-rw-r--r--   1 root     root           18 Jun 25 17:23 etc/modprobe.d/nvidia-default.conf
drwxr-xr-x  12 root     root            0 Jul 10 20:26 lib/firmware/nvidia
drwxr-xr-x   4 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm200
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm200/acr
-rw-r--r--   1 root     root          832 Mar  1 08:55 lib/firmware/nvidia/gm200/acr/bl.bin
-rw-r--r--   1 root     root        10144 Mar  1 08:55 lib/firmware/nvidia/gm200/acr/ucode_load.bin
-rw-r--r--   1 root     root         1440 Mar  1 08:55 lib/firmware/nvidia/gm200/acr/ucode_unload.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm200/gr
-rw-r--r--   1 root     root          576 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         1968 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/fecs_data.bin
-rw-r--r--   1 root     root        16271 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/fecs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/fecs_sig.bin
-rw-r--r--   1 root     root          576 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2056 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_data.bin
-rw-r--r--   1 root     root         9768 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/gpccs_sig.bin
-rw-r--r--   1 root     root         7616 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/sw_bundle_init.bin
-rw-r--r--   1 root     root         5592 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/sw_ctx.bin
-rw-r--r--   1 root     root        10800 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/sw_method_init.bin
-rw-r--r--   1 root     root         1440 Mar  1 08:55 lib/firmware/nvidia/gm200/gr/sw_nonctx.bin
drwxr-xr-x   4 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm204
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm204/acr
lrwxrwxrwx   1 root     root           22 Jul 10 20:26 lib/firmware/nvidia/gm204/acr/bl.bin -> ../../gm200/acr/bl.bin
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gm204/acr/ucode_load.bin -> ../../gm200/acr/ucode_load.bin
lrwxrwxrwx   1 root     root           32 Jul 10 20:26 lib/firmware/nvidia/gm204/acr/ucode_unload.bin -> ../../gm200/acr/ucode_unload.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm204/gr
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         1968 Mar  1 08:55 lib/firmware/nvidia/gm204/gr/fecs_data.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/fecs_inst.bin -> ../../gm200/gr/fecs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gm204/gr/fecs_sig.bin
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2056 Mar  1 08:55 lib/firmware/nvidia/gm204/gr/gpccs_data.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/gpccs_inst.bin -> ../../gm200/gr/gpccs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gm204/gr/gpccs_sig.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_bundle_init.bin -> ../../gm200/gr/sw_bundle_init.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_ctx.bin -> ../../gm200/gr/sw_ctx.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_method_init.bin -> ../../gm200/gr/sw_method_init.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gm204/gr/sw_nonctx.bin -> ../../gm200/gr/sw_nonctx.bin
drwxr-xr-x   4 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm206
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm206/acr
lrwxrwxrwx   1 root     root           22 Jul 10 20:26 lib/firmware/nvidia/gm206/acr/bl.bin -> ../../gm200/acr/bl.bin
-rw-r--r--   1 root     root        10144 Mar  1 08:55 lib/firmware/nvidia/gm206/acr/ucode_load.bin
-rw-r--r--   1 root     root         1440 Mar  1 08:55 lib/firmware/nvidia/gm206/acr/ucode_unload.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gm206/gr
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         1968 Mar  1 08:55 lib/firmware/nvidia/gm206/gr/fecs_data.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/fecs_inst.bin -> ../../gm200/gr/fecs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gm206/gr/fecs_sig.bin
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2056 Mar  1 08:55 lib/firmware/nvidia/gm206/gr/gpccs_data.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/gpccs_inst.bin -> ../../gm200/gr/gpccs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gm206/gr/gpccs_sig.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_bundle_init.bin -> ../../gm200/gr/sw_bundle_init.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_ctx.bin -> ../../gm200/gr/sw_ctx.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_method_init.bin -> ../../gm200/gr/sw_method_init.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gm206/gr/sw_nonctx.bin -> ../../gm200/gr/sw_nonctx.bin
drwxr-xr-x   4 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp100
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp100/acr
-rw-r--r--   1 root     root          832 Mar  1 08:55 lib/firmware/nvidia/gp100/acr/bl.bin
-rw-r--r--   1 root     root         9632 Mar  1 08:55 lib/firmware/nvidia/gp100/acr/ucode_load.bin
-rw-r--r--   1 root     root         1440 Mar  1 08:55 lib/firmware/nvidia/gp100/acr/ucode_unload.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp100/gr
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp100/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         2028 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/fecs_data.bin
-rw-r--r--   1 root     root        20955 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/fecs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/fecs_sig.bin
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp100/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2080 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/gpccs_data.bin
-rw-r--r--   1 root     root        12458 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/gpccs_inst.bin
-rw-r--r--   1 root     root           76 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/gpccs_sig.bin
-rw-r--r--   1 root     root         7664 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/sw_bundle_init.bin
-rw-r--r--   1 root     root         6240 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/sw_ctx.bin
-rw-r--r--   1 root     root        11928 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/sw_method_init.bin
-rw-r--r--   1 root     root         2248 Mar  1 08:55 lib/firmware/nvidia/gp100/gr/sw_nonctx.bin
drwxr-xr-x   6 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp102
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp102/acr
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp102/acr/bl.bin
-rw-r--r--   1 root     root        17152 Mar  1 08:55 lib/firmware/nvidia/gp102/acr/ucode_load.bin
-rw-r--r--   1 root     root         3328 Mar  1 08:55 lib/firmware/nvidia/gp102/acr/ucode_unload.bin
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp102/acr/unload_bl.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp102/gr
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp102/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         2256 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/fecs_data.bin
-rw-r--r--   1 root     root        20927 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/fecs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/fecs_sig.bin
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp102/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         1832 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/gpccs_data.bin
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/gpccs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/gpccs_sig.bin
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/sw_bundle_init.bin
-rw-r--r--   1 root     root         6216 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/sw_ctx.bin
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/sw_method_init.bin
-rw-r--r--   1 root     root         2496 Mar  1 08:55 lib/firmware/nvidia/gp102/gr/sw_nonctx.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp102/nvdec
-rw-r--r--   1 root     root         3840 Mar  1 08:55 lib/firmware/nvidia/gp102/nvdec/scrubber.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp102/sec2
-rw-r--r--   1 root     root          656 Mar  1 08:55 lib/firmware/nvidia/gp102/sec2/desc-1.bin
-rw-r--r--   1 root     root          656 Mar  1 08:55 lib/firmware/nvidia/gp102/sec2/desc.bin
-rw-r--r--   1 root     root       109568 Mar  1 08:55 lib/firmware/nvidia/gp102/sec2/image-1.bin
-rw-r--r--   1 root     root        99072 Mar  1 08:55 lib/firmware/nvidia/gp102/sec2/image.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp102/sec2/sig-1.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp102/sec2/sig.bin
drwxr-xr-x   6 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp104
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp104/acr
lrwxrwxrwx   1 root     root           22 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/bl.bin -> ../../gp102/acr/bl.bin
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin
lrwxrwxrwx   1 root     root           32 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gp104/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp104/gr
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         2576 Mar  1 08:55 lib/firmware/nvidia/gp104/gr/fecs_data.bin
-rw-r--r--   1 root     root        22760 Mar  1 08:55 lib/firmware/nvidia/gp104/gr/fecs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp104/gr/fecs_sig.bin
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         1832 Mar  1 08:55 lib/firmware/nvidia/gp104/gr/gpccs_data.bin
-rw-r--r--   2 root     root        13307 Mar  1 08:55 lib/firmware/nvidia/gp104/gr/gpccs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp104/gr/gpccs_sig.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_bundle_init.bin -> ../../gp102/gr/sw_bundle_init.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_ctx.bin -> ../../gp102/gr/sw_ctx.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_method_init.bin -> ../../gp102/gr/sw_method_init.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp104/gr/sw_nonctx.bin -> ../../gp102/gr/sw_nonctx.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp104/nvdec
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp104/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/desc-1.bin -> ../../gp102/sec2/desc-1.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/desc.bin -> ../../gp102/sec2/desc.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/image-1.bin -> ../../gp102/sec2/image-1.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/image.bin -> ../../gp102/sec2/image.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/sig-1.bin -> ../../gp102/sec2/sig-1.bin
lrwxrwxrwx   1 root     root           24 Jul 10 20:26 lib/firmware/nvidia/gp104/sec2/sig.bin -> ../../gp102/sec2/sig.bin
drwxr-xr-x   6 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp106
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp106/acr
lrwxrwxrwx   1 root     root           22 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/bl.bin -> ../../gp102/acr/bl.bin
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin
lrwxrwxrwx   1 root     root           32 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gp106/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp106/gr
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/fecs_bl.bin -> ../../gm200/gr/fecs_bl.bin
-rw-r--r--   1 root     root         2256 Mar  1 08:55 lib/firmware/nvidia/gp106/gr/fecs_data.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/fecs_inst.bin -> ../../gp102/gr/fecs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp106/gr/fecs_sig.bin
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/gpccs_bl.bin -> ../../gm200/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         1832 Mar  1 08:55 lib/firmware/nvidia/gp106/gr/gpccs_data.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/gpccs_inst.bin -> ../../gp102/gr/gpccs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp106/gr/gpccs_sig.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_bundle_init.bin -> ../../gp102/gr/sw_bundle_init.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_ctx.bin -> ../../gp102/gr/sw_ctx.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_method_init.bin -> ../../gp102/gr/sw_method_init.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp106/gr/sw_nonctx.bin -> ../../gp102/gr/sw_nonctx.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp106/nvdec
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp106/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/desc-1.bin -> ../../gp102/sec2/desc-1.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/desc.bin -> ../../gp102/sec2/desc.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/image-1.bin -> ../../gp102/sec2/image-1.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/image.bin -> ../../gp102/sec2/image.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/sig-1.bin -> ../../gp102/sec2/sig-1.bin
lrwxrwxrwx   1 root     root           24 Jul 10 20:26 lib/firmware/nvidia/gp106/sec2/sig.bin -> ../../gp102/sec2/sig.bin
drwxr-xr-x   6 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp107
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp107/acr
lrwxrwxrwx   1 root     root           22 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/bl.bin -> ../../gp102/acr/bl.bin
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin
lrwxrwxrwx   1 root     root           32 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gp107/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp107/gr
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/fecs_bl.bin
-rw-r--r--   1 root     root         2756 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/fecs_data.bin
-rw-r--r--   1 root     root        22879 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/fecs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/fecs_sig.bin
-rw-r--r--   3 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2100 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_data.bin
-rw-r--r--   1 root     root        12587 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/gpccs_sig.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gp107/gr/sw_bundle_init.bin -> ../../gp102/gr/sw_bundle_init.bin
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/sw_ctx.bin
lrwxrwxrwx   1 root     root           33 Jul 10 20:26 lib/firmware/nvidia/gp107/gr/sw_method_init.bin -> ../../gp102/gr/sw_method_init.bin
-rw-r--r--   2 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp107/gr/sw_nonctx.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp107/nvdec
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp107/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/desc-1.bin -> ../../gp102/sec2/desc-1.bin
lrwxrwxrwx   1 root     root           25 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/desc.bin -> ../../gp102/sec2/desc.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/image-1.bin -> ../../gp102/sec2/image-1.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/image.bin -> ../../gp102/sec2/image.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/sig-1.bin -> ../../gp102/sec2/sig-1.bin
lrwxrwxrwx   1 root     root           24 Jul 10 20:26 lib/firmware/nvidia/gp107/sec2/sig.bin -> ../../gp102/sec2/sig.bin
drwxr-xr-x   6 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp108
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp108/acr
lrwxrwxrwx   1 root     root           22 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/bl.bin -> ../../gp102/acr/bl.bin
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/ucode_load.bin -> ../../gp102/acr/ucode_load.bin
lrwxrwxrwx   1 root     root           32 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/ucode_unload.bin -> ../../gp102/acr/ucode_unload.bin
lrwxrwxrwx   1 root     root           29 Jul 10 20:26 lib/firmware/nvidia/gp108/acr/unload_bl.bin -> ../../gp102/acr/unload_bl.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp108/gr
-rw-r--r--   2 root     root          576 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/fecs_bl.bin
-rw-r--r--   1 root     root         2248 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/fecs_data.bin
-rw-r--r--   1 root     root        21161 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/fecs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/fecs_sig.bin
-rw-r--r--   3 root     root            0 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2092 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_data.bin
-rw-r--r--   1 root     root        13095 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/gpccs_sig.bin
-rw-r--r--   2 root     root         7680 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/sw_bundle_init.bin
-rw-r--r--   2 root     root         6000 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/sw_ctx.bin
-rw-r--r--   2 root     root        12288 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/sw_method_init.bin
-rw-r--r--   2 root     root         2496 Mar  1 08:55 lib/firmware/nvidia/gp108/gr/sw_nonctx.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp108/nvdec
lrwxrwxrwx   1 root     root           30 Jul 10 20:26 lib/firmware/nvidia/gp108/nvdec/scrubber.bin -> ../../gp102/nvdec/scrubber.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2
lrwxrwxrwx   1 root     root           27 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2/desc.bin -> ../../gp102/sec2/desc-1.bin
lrwxrwxrwx   1 root     root           28 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2/image.bin -> ../../gp102/sec2/image-1.bin
lrwxrwxrwx   1 root     root           26 Jul 10 20:26 lib/firmware/nvidia/gp108/sec2/sig.bin -> ../../gp102/sec2/sig-1.bin
drwxr-xr-x   6 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gv100
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gv100/acr
-rw-r--r--   2 root     root         1280 Mar  1 08:55 lib/firmware/nvidia/gv100/acr/bl.bin
-rw-r--r--   1 root     root        18688 Mar  1 08:55 lib/firmware/nvidia/gv100/acr/ucode_load.bin
-rw-r--r--   1 root     root         6400 Mar  1 08:55 lib/firmware/nvidia/gv100/acr/ucode_unload.bin
-rw-r--r--   2 root     root         1280 Mar  1 08:55 lib/firmware/nvidia/gv100/acr/unload_bl.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gv100/gr
-rw-r--r--   1 root     root          576 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/fecs_bl.bin
-rw-r--r--   1 root     root         4788 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/fecs_data.bin
-rw-r--r--   1 root     root        25632 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/fecs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/fecs_sig.bin
-rw-r--r--   3 root     root          576 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_bl.bin
-rw-r--r--   1 root     root         2128 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_data.bin
-rw-r--r--   1 root     root        12643 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_inst.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/gpccs_sig.bin
-rw-r--r--   1 root     root         7664 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/sw_bundle_init.bin
-rw-r--r--   1 root     root         9756 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/sw_ctx.bin
-rw-r--r--   1 root     root        12296 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/sw_method_init.bin
-rw-r--r--   1 root     root         2728 Mar  1 08:55 lib/firmware/nvidia/gv100/gr/sw_nonctx.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gv100/nvdec
-rw-r--r--   1 root     root         4352 Mar  1 08:55 lib/firmware/nvidia/gv100/nvdec/scrubber.bin
drwxr-xr-x   2 root     root            0 Jul 10 20:26 lib/firmware/nvidia/gv100/sec2
-rw-r--r--   1 root     root          656 Mar  1 08:55 lib/firmware/nvidia/gv100/sec2/desc.bin
-rw-r--r--   1 root     root        91136 Mar  1 08:55 lib/firmware/nvidia/gv100/sec2/image.bin
-rw-r--r--   1 root     root          192 Mar  1 08:55 lib/firmware/nvidia/gv100/sec2/sig.bin
-rw-r--r--   1 root     root       108296 Jul  8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia-drm.ko
-rw-r--r--   1 root     root     27961744 Jul  8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia.ko
-rw-r--r--   1 root     root      1486256 Jul  8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia-modeset.ko
-rw-r--r--   1 root     root      1879016 Jul  8 08:27 lib/modules/5.3.18-lp152.20.7-default/updates/nvidia-uvm.ko
Comment 55 Stefan Dirsch 2020-07-10 19:16:39 UTC
Yeah. That explains it. The old modprobe config was still being used when nvidia modules were already loaded in initrd ...
Comment 56 Cor Blom 2020-07-10 19:41:18 UTC
Building my own package is not that difficult. Stephan has provided excellent instruction in X11:Driver:Video etc project.

mkinitrd does not change anything for me.

Based of what Matthias said I experimented a bit and got the following. When I unload all nvidia modules (rmmod nvidia-drm, rmmod nvidia-modeset and rmmod nvidia) and after that I run "modprobe nvidia", the result is that I do have "/dev/nvidia-uvm(-tools)", as well as the others..

This is way beyond what I understand, but maybe it helps.
Comment 57 Stefan Dirsch 2020-07-10 19:52:57 UTC
Thanks. Unfortunately I don't get this either. :-(
Comment 58 Matthias Bach 2020-07-11 15:10:25 UTC
Created attachment 839606 [details]
Zypper output for force-reinstall of NVIDIA driver

As it honestly confused me that I thought I'd always seen the driver install perform an initrd build but the problem looks a lot like it didn't, I ran a force reinstall of the driver packages. As you can see from the attached zypper output (I shortened irrelevant parts like the EULA), it does invoke an initrd build in the %posttrans script.
Comment 59 Mister Pend 2020-07-13 03:51:11 UTC
(In reply to Matthias Bach from comment #52)
> Based on the previous test I have been able to solve the problem for myself.
> The solution is as obvious as hidden in plain sight. All I had to do was
> execute the following:
> 
> /sbin/mkinitrd
> 
> Still, I don't fully understand _why_ this solves the problem. I always
> assumed the driver installation to trigger this.
> 
> Maybe Mister Pend or Cor Blom can confirm this behaviour.

Sorry, I've tried on separate systems and can't confirm this. Executing mkinitrd doesn't error, but doesn't solve the issue. Even after a further reboot.
Comment 60 Matthias Bach 2020-07-16 18:23:52 UTC
(In reply to Mister Pend from comment #59)
> (In reply to Matthias Bach from comment #52)
> > Based on the previous test I have been able to solve the problem for myself.
> > The solution is as obvious as hidden in plain sight. All I had to do was
> > execute the following:
> > 
> > /sbin/mkinitrd
> > 
> > Still, I don't fully understand _why_ this solves the problem. I always
> > assumed the driver installation to trigger this.
> > 
> > Maybe Mister Pend or Cor Blom can confirm this behaviour.
> 
> Sorry, I've tried on separate systems and can't confirm this. Executing
> mkinitrd doesn't error, but doesn't solve the issue. Even after a further
> reboot.

After the last driver update, and having worked around bug 1174204, I know also have the issue again. The /sbin/mkinitrd didn't help this time, so it must have been something else that fixed it for me back than.

Still, I have that weird effect that unloading the modules after boot will actually make the system work as expected afterwards, until the next reboot.
Comment 61 Stefan Dirsch 2020-07-16 18:43:24 UTC
Guys. I'm sorry but at some point I need to give up. I just can't reproduce. It simply works for me with all package versions I tried.
I need to close as worksforme. I'm afraid you need to live with your workaround for now. :-(
Comment 62 Stefan Dirsch 2020-07-16 18:45:57 UTC
Actually I fixed this issue by creating the nvidia-uvm-tools device node now with the latest packages, so closing as fixed.
Comment 63 Cor Blom 2020-07-16 19:17:24 UTC
I fully understand you close this as fixed. Thanks for the work.

Now let me tell you something strange. I removed the workaround, rebooted, and it worked:

ls -l /dev/nvidia*
crw-rw----+ 1 root video 195,   0 16 jul  2020 /dev/nvidia0
crw-rw----+ 1 root video 195, 255 16 jul  2020 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254 16 jul  2020 /dev/nvidia-modeset
crw-rw----+ 1 root video 241,   0 16 jul  2020 /dev/nvidia-uvm
crw-rw----+ 1 root video 241,   1 16 jul  2020 /dev/nvidia-uvm-tools

Then I did the kernel update of today (to 26.2) and it stopped working. Then I rebooted to the older kernel (20.7) and it worked again.
Comment 64 Matthias Bach 2020-07-17 08:45:21 UTC
Created attachment 839797 [details]
Systemd Unit providing a workaround

Thanks for all the effort you put into this Stephan! Given that you cannot reproduce it I fully understand and support your decision. In fact, if my understanding is true that your systemd does not have the NVIDIA driver in the initrd and those running into the issue have, than I suspect this is more related to initrd building than the driver itself anyhow.

For anybody still running into this: I am now using the nvidia-modprobe.service files to implement an automated workaround. Just put this in /etc/systemd/system and enable it. I included dependencies for all services in openSUSE I know to be using the NVIDIA GPU. If you have further services requiring it, add them to the WantedBy and Before lines.
Comment 65 Stefan Dirsch 2020-07-17 09:53:40 UTC
BTW, I just received and accepted a pull request against suse-prime, which might be related. OTOH I think it's only needed if you already use TW (or systemd of TW).

https://github.com/openSUSE/SUSEPrime/pull/56

But it might make a difference for you ...
Comment 66 Bernhard Wiedemann 2020-07-20 00:29:15 UTC
One user reported that forcing dracut to include nvidia-uvm into initrd made it work for him

https://www.reddit.com/r/openSUSE/comments/hszckh/no_cuda_support_with_repo_drivers/fym023h
Comment 67 Stefan Dirsch 2020-07-20 08:38:24 UTC
Ok. That would mean that dracut would add other nvidia modules to initrd, which doesn't happen to me. But maybe that's related to the fact, that I'm testing here on an Optimus system with Intel being the primary GPU, so automatic module selection could look different.

Check which nvidia modules are in your initrd by running

# lsinitrd | grep nvidia | grep -v firmware

If nvidia modules are included but nvidia-uvm, you could 

1) either try adding all of them to initrd

# cat  > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
install_items+=" /usr/bin/chmod /usr/bin/mknod /usr/bin/cat /usr/bin/echo /usr/bin/chown "
add_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
EOF

2) or make sure they are not added at all

# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
EOF
Comment 68 Stefan Dirsch 2020-07-20 08:41:05 UTC
Oh. You need to run 'mkinitrd' after adding a dracut config file so the changes get active and reboot the machine in order to see the results.
Comment 69 Matthias Bach 2020-07-20 20:05:43 UTC
Thanks. That workaround works great and is much better than the one I've been using so far.

It seems like, at least on my system, nvidia-uvm is even included but the symlink for the weak updates is missing:

        ➜ sudo lsinitrd | grep nvidia | grep -v firmware
        [sudo] Passwort für root:
        -rw-r--r--   1 root     root         1483 Jul 17 09:31 etc/modprobe.d/50-nvidia-default.conf
        -rw-r--r--   1 root     root           18 Jul 16 21:43 etc/modprobe.d/nvidia-default.conf
        -rw-r--r--   1 root     root      5335848 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
        -rw-r--r--   1 root     root     38338488 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
        -rw-r--r--   1 root     root      2183784 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
        -rw-r--r--   1 root     root     42426104 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko
        lrwxrwxrwx   1 root     root           54 Jul 19 19:44 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko
        lrwxrwxrwx   1 root     root           50 Jul 19 19:44 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko
        lrwxrwxrwx   1 root     root           58 Jul 19 19:44 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko

vs.

        ➜ find /lib/modules -name 'nvidia*.ko'
        /lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
        /lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko
        /lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
        /lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
        /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia-drm.ko
        /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia-uvm.ko
        /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia-modeset.ko
        /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia.ko
        /lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko
        /lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-uvm.ko
        /lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko
        /lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko

With explicitly listing modules to be included in initrd the symlink is finally in the initrd:

        ➜  ~ sudo lsinitrd | grep nvidia | grep -v firmware
        -rw-r--r--   1 root     root         1483 Jul 17 09:31 etc/modprobe.d/50-nvidia-default.conf
        -rw-r--r--   1 root     root           18 Jul 16 21:43 etc/modprobe.d/nvidia-default.conf
        -rw-r--r--   1 root     root      5335848 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
        -rw-r--r--   1 root     root     38338488 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
        -rw-r--r--   1 root     root      2183784 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
        -rw-r--r--   1 root     root     42426104 Jul 17 09:31 lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko
        lrwxrwxrwx   1 root     root           54 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko
        lrwxrwxrwx   1 root     root           50 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko
        lrwxrwxrwx   1 root     root           58 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko
        lrwxrwxrwx   1 root     root           54 Jul 20 21:58 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-uvm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-uvm.ko
Comment 70 Stefan Dirsch 2020-07-21 03:24:49 UTC
(In reply to Stefan Dirsch from comment #67)
> 2) or make sure they are not added at all
> 
> # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
> omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
> EOF

Sorry, that was wrong. It should have been

# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
EOF

Could you please try this as well? I believe this should result in what achieved Cor Blom by comment #56.
Comment 71 steve edmonds 2020-07-21 04:29:34 UTC
Sorry to jump in here, please correct me if wrong, am I correct in assuming this bug is preventing my applications requiring CUDA (Davinci Resolve, Blender) failing to run after a DUP from Leap 15.1 to 15.2.
I have tried the Systemd Unit providing a workaround suggestion but that failed to work.
Comment 72 Mister Pend 2020-07-21 04:35:46 UTC
(In reply to Stefan Dirsch from comment #67)
> Check which nvidia modules are in your initrd by running

Looks like no nvidia-uvm on a clean installed system:

-rw-r--r--   1 root     root         1484 Jul 17 22:49 etc/modprobe.d/50-nvidia-default.conf
-rw-r--r--   1 root     root           18 Jul 17 05:43 etc/modprobe.d/nvidia-default.conf
-rw-r--r--   1 root     root       119664 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
-rw-r--r--   1 root     root     27465704 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
-rw-r--r--   1 root     root      1574168 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
lrwxrwxrwx   1 root     root           54 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko
lrwxrwxrwx   1 root     root           50 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko
lrwxrwxrwx   1 root     root           58 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko


> 2) or make sure they are not added at all

Ran your command in comment #70, followed by a mkinitrd and a reboot. And at this point, blank screen, X doesn't seem to load :(
Comment 73 Mister Pend 2020-07-21 04:46:44 UTC
(In reply to steve edmonds from comment #71)
> Sorry to jump in here, please correct me if wrong, am I correct in assuming
> this bug is preventing my applications requiring CUDA (Davinci Resolve,
> Blender) failing to run after a DUP from Leap 15.1 to 15.2.
> I have tried the Systemd Unit providing a workaround suggestion but that
> failed to work.

It would seek likely. A workaround (not a solution, but a workaround) that works for me was adding the following to the root crontab:

@reboot nvidia-modprobe -u -c=0

(or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). After this it may start working for you.
Comment 74 steve edmonds 2020-07-21 05:06:59 UTC
(In reply to Mister Pend from comment #73)
> (In reply to steve edmonds from comment #71)
> > Sorry to jump in here, please correct me if wrong, am I correct in assuming
> > this bug is preventing my applications requiring CUDA (Davinci Resolve,
> > Blender) failing to run after a DUP from Leap 15.1 to 15.2.
> > I have tried the Systemd Unit providing a workaround suggestion but that
> > failed to work.
> 
> It would seek likely. A workaround (not a solution, but a workaround) that
> works for me was adding the following to the root crontab:
> 
> @reboot nvidia-modprobe -u -c=0
> 
> (or running "nvidia-modprobe -u -c=0" as an elevated user once every boot).
> After this it may start working for you.

Unfortunately that has not worked for me either.I have not tried modifying the initrd, I am not quite sure which action to take.
sudo lsinitrd | grep nvidia | grep -v firmware 
gives only 
-rw-r--r--   1 root     root         1483 Jul 20 15:04 etc/modprobe.d/50-nvidia-default.conf
-rw-r--r--   1 root     root           18 Jul 17 07:43 etc/modprobe.d/nvidia-default.conf

Where as on my functioning Leap 15.1 I have
-rw-r--r--   1 root     root         1483 Jul 17 21:10 etc/modprobe.d/50-nvidia-default.conf
-rw-r--r--   1 root     root           18 Jul 17 07:42 etc/modprobe.d/nvidia-default.conf
-rw-r--r--   1 root     root       116160 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-drm.ko
-rw-r--r--   1 root     root     27452392 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia.ko
-rw-r--r--   1 root     root      1570992 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-modeset.ko
-rw-r--r--   1 root     root      1934696 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-uvm.ko
lrwxrwxrwx   1 root     root           55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-drm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-drm.ko
lrwxrwxrwx   1 root     root           51 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia.ko
lrwxrwxrwx   1 root     root           59 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-modeset.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-modeset.ko
lrwxrwxrwx   1 root     root           55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-uvm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-uvm.ko
Comment 75 Matthias Bach 2020-07-21 06:23:24 UTC
(In reply to Stefan Dirsch from comment #70)
> (In reply to Stefan Dirsch from comment #67)
> > 2) or make sure they are not added at all
> > 
> > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
> > omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
> > EOF
> 
> Sorry, that was wrong. It should have been
> 
> # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
> omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
> EOF
> 
> Could you please try this as well? I believe this should result in what
> achieved Cor Blom by comment #56.

I can confirm that this also fixes the issue for me. I like this even better than force-including them into the initrd as this gives a nice reduction in initrd size from 37 MiB to 14 MiB.
Comment 76 Matthias Bach 2020-07-21 06:27:28 UTC
(In reply to steve edmonds from comment #74)
> (In reply to Mister Pend from comment #73)
> > (In reply to steve edmonds from comment #71)
> > > Sorry to jump in here, please correct me if wrong, am I correct in assuming
> > > this bug is preventing my applications requiring CUDA (Davinci Resolve,
> > > Blender) failing to run after a DUP from Leap 15.1 to 15.2.
> > > I have tried the Systemd Unit providing a workaround suggestion but that
> > > failed to work.
> > 
> > It would seek likely. A workaround (not a solution, but a workaround) that
> > works for me was adding the following to the root crontab:
> > 
> > @reboot nvidia-modprobe -u -c=0
> > 
> > (or running "nvidia-modprobe -u -c=0" as an elevated user once every boot).
> > After this it may start working for you.
> 
> Unfortunately that has not worked for me either.I have not tried modifying
> the initrd, I am not quite sure which action to take.
> sudo lsinitrd | grep nvidia | grep -v firmware 
> gives only 
>
> […]

That initrd looks correct to my non-expert eye. Some other things that might be interesting:

1) Output of `lsmod | grep nvidia`
2) Output of `ls -lh /dev/nvidia*`
3) Is you user a member of the group `video`?
4) Output of `clinfo | head -n 5`
Comment 77 steve edmonds 2020-07-21 08:21:40 UTC
(In reply to Matthias Bach from comment #76)

> 
> That initrd looks correct to my non-expert eye. Some other things that might
> be interesting:
> 
> 1) Output of `lsmod | grep nvidia`
> 2) Output of `ls -lh /dev/nvidia*`
> 3) Is you user a member of the group `video`?
> 4) Output of `clinfo | head -n 5`

1.>lsmod | grep nvidia
(nothing)
2.>ls -lh /dev/nvidia*
ls: cannot access '/dev/nvidia*': No such file or directory
3. Yes
4. clinfo | head -n 5
Number of platforms                               0

The same video card and GO5 driver (450.57) working in Leap 15.1 gives quite different responses.
Comment 78 Stefan Dirsch 2020-07-21 08:46:09 UTC
(In reply to steve edmonds from comment #71)
> Sorry to jump in here, please correct me if wrong, am I correct in assuming
> this bug is preventing my applications requiring CUDA (Davinci Resolve,
> Blender) failing to run after a DUP from Leap 15.1 to 15.2.

Yes, this sounds reasonable!

(In reply to steve edmonds from comment #74)
> Unfortunately that has not worked for me either.I have not tried modifying
> the initrd, I am not quite sure which action to take.
> sudo lsinitrd | grep nvidia | grep -v firmware 
> gives only 
> -rw-r--r--   1 root     root         1483 Jul 20 15:04
> etc/modprobe.d/50-nvidia-default.conf
> -rw-r--r--   1 root     root           18 Jul 17 07:43
> etc/modprobe.d/nvidia-default.conf

That does not need to be an issue. I have the same behaviour on my working system.

> Where as on my functioning Leap 15.1 I have
>
> -rw-r--r--   1 root     root         1483 Jul 17 21:10
> etc/modprobe.d/50-nvidia-default.conf
> -rw-r--r--   1 root     root           18 Jul 17 07:42
> etc/modprobe.d/nvidia-default.conf
> -rw-r--r--   1 root     root       116160 Jul 17 21:11
> lib/modules/4.12.14-lp151.27-default/updates/nvidia-drm.ko
> -rw-r--r--   1 root     root     27452392 Jul 17 21:11
> lib/modules/4.12.14-lp151.27-default/updates/nvidia.ko
> -rw-r--r--   1 root     root      1570992 Jul 17 21:11
> lib/modules/4.12.14-lp151.27-default/updates/nvidia-modeset.ko
> -rw-r--r--   1 root     root      1934696 Jul 17 21:11
> lib/modules/4.12.14-lp151.27-default/updates/nvidia-uvm.ko
> lrwxrwxrwx   1 root     root           55 Jul 17 21:11
> lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-drm.ko
> -> ../../../4.12.14-lp151.27-default/updates/nvidia-drm.ko
> lrwxrwxrwx   1 root     root           51 Jul 17 21:11
> lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia.ko ->
> ../../../4.12.14-lp151.27-default/updates/nvidia.ko
> lrwxrwxrwx   1 root     root           59 Jul 17 21:11
> lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-modeset.
> ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-modeset.ko
> lrwxrwxrwx   1 root     root           55 Jul 17 21:11
> lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-uvm.ko
> -> ../../../4.12.14-lp151.27-default/updates/nvidia-uvm.ko

Yes, this looks consistent.

(In reply to steve edmonds from comment #77)
> 1.>lsmod | grep nvidia
> (nothing)
> 2.>ls -lh /dev/nvidia*
> ls: cannot access '/dev/nvidia*': No such file or directory
> 3. Yes
> 4. clinfo | head -n 5
> Number of platforms                               0
> 
> The same video card and GO5 driver (450.57) working in Leap 15.1 gives quite
> different responses.

OMG. I'm wondering whether you really have nvidia-gfxG05-kmp-default package installed. If yes, what does 

  modprobe nvidia

trigger? Check also dmesg output. Also please make sure you have the latest G05 packages installed from our Leap 15.2 repos.
Comment 79 Stefan Dirsch 2020-07-21 08:50:00 UTC
(In reply to Mister Pend from comment #72)
> (In reply to Stefan Dirsch from comment #67)
> > Check which nvidia modules are in your initrd by running
> 
> Looks like no nvidia-uvm on a clean installed system:
> 
> -rw-r--r--   1 root     root         1484 Jul 17 22:49
> etc/modprobe.d/50-nvidia-default.conf
> -rw-r--r--   1 root     root           18 Jul 17 05:43
> etc/modprobe.d/nvidia-default.conf
> -rw-r--r--   1 root     root       119664 Jul 17 22:49
> lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko
> -rw-r--r--   1 root     root     27465704 Jul 17 22:49
> lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko
> -rw-r--r--   1 root     root      1574168 Jul 17 22:49
> lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko
> lrwxrwxrwx   1 root     root           54 Jul 17 22:49
> lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko ->
> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko
> lrwxrwxrwx   1 root     root           50 Jul 17 22:49
> lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko ->
> ../../../5.3.18-lp152.19-default/updates/nvidia.ko
> lrwxrwxrwx   1 root     root           58 Jul 17 22:49
> lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko
> -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko

Yes, exactly the same issue as Matthias Bach had.

> > 2) or make sure they are not added at all
> 
> Ran your command in comment #70, followed by a mkinitrd and a reboot. And at
> this point, blank screen, X doesn't seem to load :(

My fault, the advice was wrong. Please try instead - as already corrected in my comment #70

# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
EOF
# mkinitrd

According to Matthias Bach this should work.
Comment 80 steve edmonds 2020-07-21 09:15:41 UTC
(In reply to Stefan Dirsch from comment #78)

> 
> OMG. I'm wondering whether you really have nvidia-gfxG05-kmp-default package
> installed. If yes, what does 
> 
>   modprobe nvidia
> 
> trigger? Check also dmesg output. Also please make sure you have the latest
> G05 packages installed from our Leap 15.2 repos.

> sudo modprobe nvidia (done via ssh as not in front of the 15.2 PC but with a screen locked X11 session running on it)
modprobe: ERROR: could not find module by name='nvidia'
modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)

From Yast
i  │nvidia-computeG05        │NVIDIA driver for computing with GPGPU                                 i  │nvidia-gfxG05-kmp-default│NVIDIA graphics driver kernel module for GeForce 600 series and newer
i  │nvidia-glG05             │NVIDIA OpenGL libraries for OpenGL acceleration                       i  │x11-video-nvidiaG05      │NVIDIA graphics driver for GeForce 600 series and newer

Also, my CAD software balks if I do not have the above drivers loaded.

Only reference I found in dmesg is
[    4.599729] audit: type=1400 audit(1595298800.894:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=482 comm="apparmor_parser"
[    4.599732] audit: type=1400 audit(1595298800.894:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=482 comm="apparmor_parser"
Comment 81 Stefan Dirsch 2020-07-21 10:33:50 UTC
(In reply to Matthias Bach from comment #75)
> (In reply to Stefan Dirsch from comment #70)
> > (In reply to Stefan Dirsch from comment #67)
> > > 2) or make sure they are not added at all
> > > 
> > > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
> > > omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
> > > EOF
> > 
> > Sorry, that was wrong. It should have been
> > 
> > # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
> > omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
> > EOF
> > 
> > Could you please try this as well? I believe this should result in what
> > achieved Cor Blom by comment #56.
> 
> I can confirm that this also fixes the issue for me. I like this even better
> than force-including them into the initrd as this gives a nice reduction in
> initrd size from 37 MiB to 14 MiB.

Thanks for feedback! I've implemented this now in our packages. Anyone building the packages themselves from obs://X11:Drivers:Video can test this right now. Hello @Cor Blom, glad to know that at least one person is making use of this service! :-)
Comment 82 Stefan Dirsch 2020-07-21 10:36:08 UTC
Closing as fixed.

(In reply to steve edmonds from comment #80)
> > sudo modprobe nvidia (done via ssh as not in front of the 15.2 PC but with a screen locked X11 session running on it)
> modprobe: ERROR: could not find module by name='nvidia'
> modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or
> unknown parameter (see dmesg)
> 
> From Yast
> i  │nvidia-computeG05        │NVIDIA driver for computing with GPGPU        
> i  │nvidia-gfxG05-kmp-default│NVIDIA graphics driver kernel module for
> GeForce 600 series and newer
> i  │nvidia-glG05             │NVIDIA OpenGL libraries for OpenGL
> acceleration                       i  │x11-video-nvidiaG05      │NVIDIA
> graphics driver for GeForce 600 series and newer
> 
> Also, my CAD software balks if I do not have the above drivers loaded.
> 
> Only reference I found in dmesg is
> [    4.599729] audit: type=1400 audit(1595298800.894:4): apparmor="STATUS"
> operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=482
> comm="apparmor_parser"
> [    4.599732] audit: type=1400 audit(1595298800.894:5): apparmor="STATUS"
> operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod"
> pid=482 comm="apparmor_parser"

Seems you're failing on a complete different level. No nvidia modules installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. If this doesn't help, open a separate bug. It's really unrelated to this one ...
Comment 83 Cor Blom 2020-07-21 11:25:07 UTC
I have built the latest version and can confirm it fixes this bug. Thanks.
Comment 84 Stefan Dirsch 2020-07-21 12:00:23 UTC
Thanks a lot for positive feedback, @Cor Blom! :-)
Comment 85 steve edmonds 2020-07-21 18:27:27 UTC
(In reply to Stefan Dirsch from comment #82)
> 
> Seems you're failing on a complete different level. No nvidia modules
> installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package.
> If this doesn't help, open a separate bug. It's really unrelated to this one
> ...

Do you think it could be related to this release note;
4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed

openSUSE Leap 15.2 now enables a kernel module signature check for third-party drivers (CONFIG_MODULE_SIG=y). This is an important security measure to avoid untrusted code running in the kernel.

This may prevent third-party kernel modules from being loaded if UEFI Secure Boot is enabled. Importantly, this affects NVIDIA......

Although I can't see why my CAD complains of no openGL if I remove the installed Nvidia packages.
Comment 86 Matthias Bach 2020-07-21 18:33:08 UTC
(In reply to steve edmonds from comment #85)
> (In reply to Stefan Dirsch from comment #82)
> > 
> > Seems you're failing on a complete different level. No nvidia modules
> > installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package.
> > If this doesn't help, open a separate bug. It's really unrelated to this one
> > ...
> 
> Do you think it could be related to this release note;
> 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed

Yes, if you are using secure boot your issues are most likely be caused by this. But that too should be solved with the latest package version. You'll have to manually import the package signing key once, though. See bug 1173682 for details.
Comment 87 steve edmonds 2020-07-21 19:32:18 UTC
(In reply to Matthias Bach from comment #86)
> (In reply to steve edmonds from comment #85)
> > (In reply to Stefan Dirsch from comment #82)
> > > 
> > > Seems you're failing on a complete different level. No nvidia modules
> > > installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package.
> > > If this doesn't help, open a separate bug. It's really unrelated to this one
> > > ...
> > 
> > Do you think it could be related to this release note;
> > 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed
> 
> Yes, if you are using secure boot your issues are most likely be caused by
> this. But that too should be solved with the latest package version. You'll
> have to manually import the package signing key once, though. See bug
> 1173682 for details.

May be secure boot is not my issue, I am booting with GRUB2 without EFI and enable trusted boot support off.
Comment 88 steve edmonds 2020-07-22 03:36:28 UTC
Thanks to all.
Back in front of the problem machine today I tried the proposed work around;

# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF
omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm"
EOF
# mkinitrd

This worked!.

Interestingly though when running mkinitrd I have output as below. Is the first part relating to the kernel 4.4.76-1 supposed to be here or is it a hangover from Leap 15.1.

Creating initrd: /boot/initrd-4.4.76-1-default
dracut: Executing: /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force --force-drivers "xennet xenblk" /boot/initrd-4.4.76-1-default 4.4.76-1-default
.
.
dracut-install: Failed to find module 'des'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ySNXfs/initramfs -H -N i2o_scsi|nvidia|nvidia_drm|nvidia-modeset|nvidia-uvm --kerneldir /lib/modules/4.4.76-1-default/ -m des ecb md4 md5 hmac arc4 nls_utf8
dracut: *** Including modules done ***
dracut-install: Failed to find module 'xennet'
dracut: FAILED:  /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ySNXfs/initramfs -N i2o_scsi|nvidia|nvidia_drm|nvidia-modeset|nvidia-uvm --kerneldir /lib/modules/4.4.76-1-default/ -m xennet xenblk
dracut: *** Installing kernel module dependencies ***
depmod: WARNING: could not open modules.order at /var/tmp/dracut.ySNXfs/initramfs/lib/modules/4.4.76-1-default: No such file or directory
depmod: WARNING: could not open modules.builtin at /var/tmp/dracut.ySNXfs/initramfs/lib/modules/4.4.76-1-default: No such file or directory
.
.
before
Creating initrd: /boot/initrd-5.3.18-lp152.20.7-default
Comment 89 steve edmonds 2020-07-22 05:38:00 UTC
The errors in mkinitrd are not related, it is another bug preventing the purge-kernels service running.
Comment 90 Bernhard Wiedemann 2020-07-22 08:58:48 UTC
(In reply to steve edmonds from comment #88)
> Interestingly though when running mkinitrd I have output as below. Is the
> first part relating to the kernel 4.4.76-1 supposed to be here or is it a
> hangover from Leap 15.1.
> 
> Creating initrd: /boot/initrd-4.4.76-1-default

This is expected.
We always leave some older kernels around so if the latest one is bad, you can boot into an older kernel.
Comment 91 steve edmonds 2020-07-23 04:27:51 UTC
I thought I had this all under control so moved on to DUP the next Leap 15.1 machine to 15.2.

After the upgrade the Nvidia driver loads except for nvidia-uvm, so I have openGL support but no CUDA support. If I run clinfo | head -n 5 as normal user, nvidia_uvm doesn't load but if I run the command elevated it does load and stays loaded. (output below)

The previous suggestions of cat > /etc/dracut.conf.d/50-nvidia-default.conf.... don't work her because after the dup I have only 1 file there
> ls /etc/dracut.conf.d
99-debug.conf

I do have 
/etc/modprobe.d/50-nvidia-default.conf
/etc/modprobe.d/nvidia-default.conf

I suspect the @reboot nvidia-modprobe -u -c=0 solution may be my option

> lsmod | grep nvidia
nvidia_drm             61440  16
nvidia_modeset       1187840  34 nvidia_drm
nvidia              19726336  1586 nvidia_modeset
drm_kms_helper        229376  1 nvidia_drm
drm                   544768  19 drm_kms_helper,nvidia_drm
> clinfo | head -n 5
Number of platforms                               0
steve@linux-qw83:~> lsmod | grep nvidia
nvidia_drm             61440  17
nvidia_modeset       1187840  36 nvidia_drm
nvidia              19726336  1671 nvidia_modeset
drm_kms_helper        229376  1 nvidia_drm
drm                   544768  20 drm_kms_helper,nvidia_drm
> sudo clinfo | head -n 5
[sudo] password for root: 
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 11.0.210
  Platform Profile                                FULL_PROFILE
> lsmod | grep nvidia
nvidia_uvm           1110016  0
nvidia_drm             61440  17
nvidia_modeset       1187840  36 nvidia_drm
nvidia              19726336  1672 nvidia_uvm,nvidia_modeset
drm_kms_helper        229376  1 nvidia_drm
drm                   544768  20 drm_kms_helper,nvidia_drm
Comment 92 Stefan Dirsch 2020-07-23 09:06:28 UTC
Removed /etc/dracut.conf.d/50-nvidia-default.conf could be explained by having suse-prime package installed (Optimus systems with Intel/NVIDIA GPU combo), but then you would have a  /etc/dracut.conf.d/90-nvidia-dracut-G05.conf or /usr/lib/dracut/dracut.conf.d/90-nvidia-dracut-G05.conf installed with the same content and you still shouldn't have any nvidia modules in your initrd.

I don't have an explanation for this right now.
Comment 93 steve edmonds 2020-07-23 10:01:46 UTC
(In reply to Stefan Dirsch from comment #92)
> Removed /etc/dracut.conf.d/50-nvidia-default.conf could be explained by
> having suse-prime package installed (Optimus systems with Intel/NVIDIA GPU
> combo), but then you would have a 
> /etc/dracut.conf.d/90-nvidia-dracut-G05.conf or
> /usr/lib/dracut/dracut.conf.d/90-nvidia-dracut-G05.conf installed with the
> same content and you still shouldn't have any nvidia modules in your initrd.
> 
> I don't have an explanation for this right now.

The files were there under Leap 15.1, I am assuming nvidia-computeG05 provides /etc/modprobe.d/50-nvidia-default.conf but I have no idea what process leads to the presence of /etc/dracut.conf.d/50-nvidia-default.conf
Comment 94 Stefan Dirsch 2020-07-23 10:30:29 UTC
> The files were there under Leap 15.1, I am assuming nvidia-computeG05
> provides /etc/modprobe.d/50-nvidia-default.conf 

No, that's part of nvidia-gfxG05-kmp-default packgaes

> but I have no idea what
> process leads to the presence of /etc/dracut.conf.d/50-nvidia-default.conf

That was the temporary workaround, This won't be needed with the next driver package update. With that nvidia-gfxG05-kmp-default 
will include  /etc/dracut.conf.d/60-nvidia-default.conf with the same content, i.e. nvidia modules won't be added any longer to initrd.
Comment 95 steve edmonds 2020-07-25 08:52:53 UTC
(In reply to Stefan Dirsch from comment #94)
An updated Nvidia driver just installed from the repository and CUDA apps are working as expected now.
Comment 96 Stefan Dirsch 2020-07-25 12:32:04 UTC
Indeed. Repos have been updated yesterday! :-)
Comment 97 Mister Pend 2020-07-25 12:34:29 UTC
Was this an NVIDIA issue then, rather than OpenSUSE? Asking out of curiosity
Comment 98 Stefan Dirsch 2020-07-25 14:48:40 UTC
Well, if one doesn't appreciate that  openSUSE takes care about security and therefore doesn't make nvidia-modprobe suid root, you can call it an openSUSE issue ...
Comment 106 Maintenance Automation 2023-04-19 16:30:02 UTC
SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed.

Category: feature (moderate)
Bug References: 1173733, 1207495, 1207520
Jira References: PED-2658, SLE-24579
Sources used:
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 107 Maintenance Automation 2023-04-19 20:30:05 UTC
SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed.

Category: feature (moderate)
Bug References: 1173733, 1207495, 1207520
Jira References: PED-2658, SLE-24579
Sources used:
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 108 Maintenance Automation 2023-04-20 08:30:13 UTC
SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed.

Category: feature (moderate)
Bug References: 1173733, 1207495, 1207520
Jira References: PED-2658, SLE-24579
Sources used:
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 109 Maintenance Automation 2023-04-20 12:30:05 UTC
SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed.

Category: feature (moderate)
Bug References: 1173733, 1207495, 1207520
Jira References: PED-2658, SLE-24579
Sources used:
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 110 Maintenance Automation 2023-04-20 16:30:04 UTC
SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed.

Category: feature (moderate)
Bug References: 1173733, 1207495, 1207520
Jira References: PED-2658, SLE-24579
Sources used:
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 111 Maintenance Automation 2023-05-08 09:06:55 UTC
SUSE-FU-2023:1919-1: An update that contains two features and has three feature fixes can now be installed.

Category: feature (moderate)
Bug References: 1173733, 1207495, 1207520
Jira References: PED-2658, SLE-24579
Sources used:
Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3
Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-525.105.17-150400.9.5.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.