|
Bugzilla – Full Text Bug Listing |
| Summary: | kernel 4.4.71 doesn't allow user to install Nvidia blob with nvidia-drm feature enabled | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Roman Bysh <rb03884> |
| Component: | X11 3rd Party Driver | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED FIXED | QA Contact: | Stefan Dirsch <sndirsch> |
| Severity: | Major | ||
| Priority: | P2 - High | CC: | bruno, itaranto7, lnussel, mmarek, patrik.jakobsson, rb03884, tiwai |
| Version: | Leap 42.3 | ||
| Target Milestone: | Leap 42.3 | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 42.2 | ||
| Whiteboard: | |||
| Found By: | Community User | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
nvidia-installer.log with problem where Nvidia installation failed
Metadata for Project X11:Drivers:Video |
||
|
Description
Roman Bysh
2017-06-17 23:45:08 UTC
Created attachment 729301 [details]
nvidia-installer.log with problem where Nvidia installation failed
Was this change because enabling KMS causes GDM and GNOME to default to Wayland, which currently suffers from very poor performance? Found on Arch wiki. Is drm-kmp-default installed? If yes, try to uninstall and re-install nvidia again. (In reply to Takashi Iwai from comment #3) > Is drm-kmp-default installed? If yes, try to uninstall and re-install > nvidia again. Yes. It's installed. I've uninstalled and reinstalled the Nvidia driver several times and it doesn't help. Did you use a different version of gcc when compiling the sources for kernels 4.4.10x and 4.4.11x? (In reply to Roman Bysh from comment #4) > (In reply to Takashi Iwai from comment #3) > > Is drm-kmp-default installed? If yes, try to uninstall and re-install > > nvidia again. > > Yes. It's installed. I've uninstalled and reinstalled the Nvidia driver > several times and it doesn't help. Double-check it. Uninstall drm-kmp, and uninstall nvidia driver, then reboot as a clean state. Make sure that there is no leftover from the old nvidia stuff. Then install nvidia again. > Did you use a different version of gcc when compiling the sources for > kernels 4.4.10x and 4.4.11x? Yes, opeanSUSE Leap 42.2/42.3 is gcc 4.8 while the TW is gcc 6.x or 7.x. But it shouldn't matter. If the problem isn't about the drm-kmp, the problem is in nvidia driver. Also, make sure that the version and the release number of all kernel packages do match, i.e. kernel-default, kernel-default-devel, kernel-devel, kernel-macros, kernel-syms, kernel-source, etc. (In reply to Takashi Iwai from comment #5) > Double-check it. Uninstall drm-kmp, and uninstall nvidia driver, then > reboot as a clean state. One more thing: if initrd wasn't recreated after uninstalling drm-kmp, rebuild initrd manually. The rebuild of initrd should be triggered automatically when necessary, but just to be sure... Something got backported from kernel 4.11 that may be causing this problem. Ref: http://bit.ly/2rKD6i9 Please read https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d557d1b58b3546bab2c5bc2d624c5709840e6b10 Or this kernel patch from kernel 4.11.1 should fix the drm problem. http://bit.ly/2rKH4az No, irrelevant. Okay. I'll get back to you later. I first uninstalled the Nvidia driver and removed the drm-kmp--default package. I rebooted and ran mkinitrd. The Nvidia driver 381.22 installed without a problem. I guess this ran be marked as resolved. This should be put in the release notes. Thank you for the directions. I first uninstalled the Nvidia driver and removed the drm-kmp--default package. I rebooted and ran mkinitrd. The Nvidia driver 381.22 installed without a problem. I guess this can be marked as resolved. This should be put in the release notes. Thank you for the directions. The problem was with the "drm-kmp-default" package not the Nvidia driver. So, it's a problem of drm core kABI incompatibility as I expected. We need to address it, at least for Nvidia. The best option would be backport the changes regarding the drm core stuff up to 4.9.x, but it'd be too big and risky at this moment, as it'll result in a few hundreds of patches. Another option would be to make a minimalistic patch (or patches) to add / modify the structs in include/drm and include/uapi/drm just to satisfy Nvidia kABI. Yet another option would be to add Conflicts tag with nvidia package. But it won't help if user installs the nvidia stuff manually. Also, another concern is the conflict with AMDGPU-pro package... A big problem right now is that I'm going to freeze kABI in this week, likely in today. We can adapt the further change as an exception until RC2, though. In anyway, we need a very quick resolution. Patrik, Michal, what do you think? (In reply to Takashi Iwai from comment #15) > Another option would be to make a minimalistic patch (or patches) to add / > modify the structs in include/drm and include/uapi/drm just to satisfy > Nvidia kABI. This doesn't look easy, either, unfortunately. There are way too many referred structs that have been changed intrusively. As a quick "fix" for nvidia, we can change the depedencies on Leap to match SLE and only install drm-kmp on Intel & AMD. This won't help though with AMD's Pro driver. However, I do not quite understand what you mean by backporting / changing structures in include/drm and include/uapi/drm. The problem is with the drm-kmp, not the in-kernel code, isn't it? (In reply to Michal Marek from comment #17) > As a quick "fix" for nvidia, we can change the depedencies on Leap to match > SLE and only install drm-kmp on Intel & AMD. This won't help though with > AMD's Pro driver. Yes, it at least reduces the probability to hit. But the problem is still present and it appears on a machine SKL + NVidia GPU, for example. > However, I do not quite understand what you mean by backporting / changing > structures in include/drm and include/uapi/drm. The problem is with the > drm-kmp, not the in-kernel code, isn't it? The problem happens since nvidia KMP builds itself based on kernel-default-devel, thus it refers to the drm stack of kernel-default. When we have drm core stack already updated to be aligned with drm-kmp, we can install both drm-kmp and nvidia gracefully. At least, it worked on SP2 (the drm core stack was already aligned for 4.6), but at this time, the difference is larger, thus it's much tougher. > Yet another option would be to add Conflicts tag with nvidia package. I can add Provides/Obsoletes to the NVIDIA KMP, so this won't trigger any needed user input during installation. Of course if the customer decides to uninstall the NVIDIA RPMs again, drm-kmp-default won't be reinstalled automatically. > But it won't help if user installs the nvidia stuff manually. We can add this to the documentation. Release notes and https://en.opensuse.org/SDB:NVIDIA_the_hard_way > Also, another concern is the conflict with AMDGPU-pro package... Well, I'm not aware of any customers yet, who tried to install these packages. ;-) > As a quick "fix" for nvidia, we can change the depedencies on Leap to match SLE > and only install drm-kmp on Intel & AMD. This won't help though with AMD's Pro > driver. I also thought about removing the NVIDIA Device IDs from Supplements of drm-kmp-default, but it would be a regression for many people not knowing to install the drm-kmp-default package manually. And there are also hybrid machines (Intel+NVIDIA), so for these this won't help anyway. Reassigning to myself. (In reply to Stefan Dirsch from comment #19) > > Yet another option would be to add Conflicts tag with nvidia package. > > I can add Provides/Obsoletes to the NVIDIA KMP, so this won't trigger any > needed user input during installation. Of course if the customer decides to > uninstall the NVIDIA RPMs again, drm-kmp-default won't be reinstalled > automatically. Yes, that'd be good in anyway, no matter whether we reduce the targeted devices of supplements or not. But, one thing to be noted is that you'll still need to reboot the system to allow nvidia driver to be loaded. > > But it won't help if user installs the nvidia stuff manually. > > We can add this to the documentation. Release notes and > https://en.opensuse.org/SDB:NVIDIA_the_hard_way > > > Also, another concern is the conflict with AMDGPU-pro package... > > Well, I'm not aware of any customers yet, who tried to install these > packages. ;-) > > > As a quick "fix" for nvidia, we can change the depedencies on Leap to match SLE > and only install drm-kmp on Intel & AMD. This won't help though with AMD's Pro > driver. > > I also thought about removing the NVIDIA Device IDs from Supplements of > drm-kmp-default, but it would be a regression for many people not knowing to > install the drm-kmp-default package manually. Strictly speaking, it's no "regression". Then user will get the very same driver as Leap 42.2. But 4.9 drivers in general contain more fixes than 4.4 drivers especially for the recent hardwares, so it's rather "user doesn't receive a fix". > And there are also hybrid > machines (Intel+NVIDIA), so for these this won't help anyway. Right. *** Bug 1044955 has been marked as a duplicate of this bug. *** (In reply to Takashi Iwai from comment #21) > But, one thing to be noted is that you'll still need to reboot the system to > allow nvidia driver to be loaded. Which is always required, if nouveau is already been loaded/active. So no new problem. (In reply to Stefan Dirsch from comment #23) > (In reply to Takashi Iwai from comment #21) > > But, one thing to be noted is that you'll still need to reboot the system to > > allow nvidia driver to be loaded. > > Which is always required, if nouveau is already been loaded/active. So no > new problem. Ah, of course! :) Great. en.opensuse.org is no longer responding, so I cannot submit my changes any longer. :-( https://en.opensuse.org/SDB:NVIDIA_the_hard_way [...] NVIDIA proprietary driver works flawlessly on Leap 42.2. On Leap 42.3 you need to uninstall the drm-kmp-default package first (boo#1044816). {{Shell|$ zypper rm drm-kmp-default }} For Tumbleweed [...] Just not to loose them completely ... Nvidia packages are adjusted. So the issue will be fixed with the next update. (In reply to Stefan Dirsch from comment #25) > Great. en.opensuse.org is no longer responding, so I cannot submit my > changes any longer. :-( > > https://en.opensuse.org/SDB:NVIDIA_the_hard_way > > [...] > NVIDIA proprietary driver works flawlessly on Leap 42.2. On Leap 42.3 you > need to uninstall the drm-kmp-default package first (boo#1044816). > > {{Shell|$ zypper rm drm-kmp-default }} > > For Tumbleweed [...] > > Just not to loose them completely ... Ah. Now it worked. :-) Release notes? No idea, who can take care of these? The easiest way is to just file a pull request at https://github.com/openSUSE/release-notes-openSUSE/ Or file a bug with the suggested text to the release notes component. (In reply to Ludwig Nussel from comment #29) > The easiest way is to just file a pull request at > https://github.com/openSUSE/release-notes-openSUSE/ > > Or file a bug with the suggested text to the release notes component. Thanks. Openend bsc#1045124 for this now. Closing this one. ;-) Created attachment 729721 [details] Metadata for Project X11:Drivers:Video Hi Stefan, would you mind to setup also the target for repository https://build.opensuse.org/project/show/X11:Drivers:Video I know it doesn't build, but having the right target allow us to build locally packages (which is what I'm doing for my TW) and now the needs for Leap 42.3 is important due to this bug I've prepared a ready to use xml which disable build and publish (whatever future or old version come) Perhaps you want to improve the useforbuild list Ok. Added new targets like Leap 42.3, removed no longer supported targets. The issue is still present with NVIDIA RPMS... After a fresh install of Leap 42.3, I installed the following packages: x11-video-nvidiaG04 nvidia-gfxG04-kmp-default nvidia-computeG04 nvidia-glG04 which marked automatically drm-kmp-default for removal. The result was a black screen after rebooting (OK, almost. I did see the mouse pointer but SDDM crashed). I resolved this problem by uninstalling drm-kmp-default first, running mkinitrd and then reboot. After this I installed the nvidia driver, and after another reboot, everything was fine. So, it seems that in the first scenario mkinitrd was not ran after drm-kmp-default removal (just a guess). Should I reopen this issue or submit a new one? Environment: Intel Core i5-2310 + NVIDIA GeForce GTX 650 Intel Core i5-2310 NVIDIA GeForce GTX 650 Ignacio, please let's investigate that issue in boo#1053934. |