Bug 1053115

Summary: gdm fails to start up with nvidia blob RPM but works with installer
Product: [openSUSE] openSUSE Tumbleweed Reporter: Robert Munteanu <rombert>
Component: X11 3rd Party DriverAssignee: E-mail List <xorg-maintainer-bugs>
Status: RESOLVED FIXED QA Contact: Stefan Dirsch <sndirsch>
Severity: Major    
Priority: P5 - None CC: igarcia, kkirill, mati865, nico.kruber, rombert
Version: Current   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: nvidia-bug-report gathered when using the RPMs
nvidia-bug-report gathered when using the .run installer
journalctl -b output after trying to login using the RPM drivers
50-nvidia.conf
nvidia-bug-report gathered with the RPM installer after modifying 50-nvidia.conf
journalctl -b output when using the RPMs with the modified 50-nvidia.conf file

Description Robert Munteanu 2017-08-09 21:08:43 UTC
Created attachment 735973 [details]
nvidia-bug-report gathered when using the RPMs

(note that this is different from #995924, where gdm was concerned)

Using xdm for a display manager, I am unable to start Xorg with the nvidia drivers installed from the TW RPM repo. With the drivers installed with the .run installer, all is fine.

What happens is that xdm starts up fine, I am able to input the username and the password, but after submitting I just get back to xdm in 3-5 seconds.

I also noted an error when installing the RPMS:

  touch: cannot touch '/run/regenerate-initrd/all': No such file or directory

but that's probably not related.
Comment 1 Robert Munteanu 2017-08-09 21:09:36 UTC
Created attachment 735974 [details]
nvidia-bug-report gathered when using the .run installer
Comment 2 Robert Munteanu 2017-08-09 21:10:45 UTC
Created attachment 735975 [details]
journalctl -b output after trying to login using the RPM drivers
Comment 3 Robert Munteanu 2017-08-09 21:13:49 UTC
Sorry, please ignore the journactl output - it was taken when still using gdm.
Comment 4 Stefan Dirsch 2017-08-10 09:41:40 UTC
(In reply to Robert Munteanu from comment #0)
> I also noted an error when installing the RPMS:
> 
>   touch: cannot touch '/run/regenerate-initrd/all': No such file or directory
> 
> but that's probably not related.

Thanks. You found a bug. Unfortunately it doesn't seem to be the culptir here. At least on my system kernel modules are still added to the initrd with this bug in place. Nevertheless I fixed this now.
Comment 5 Robert Munteanu 2017-08-10 09:46:10 UTC
(In reply to Stefan Dirsch from comment #4)
> (In reply to Robert Munteanu from comment #0)
> > I also noted an error when installing the RPMS:
> > 
> >   touch: cannot touch '/run/regenerate-initrd/all': No such file or directory
> > 
> > but that's probably not related.
> 
> Thanks. You found a bug. Unfortunately it doesn't seem to be the culptir
> here. At least on my system kernel modules are still added to the initrd
> with this bug in place. Nevertheless I fixed this now.

I think the modules are added for me as well

[   92.441921] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.59  Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
[   92.448745] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 242
[   92.449710] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.59  Wed Jul 19 23:46:42 PDT 2017

If there's any other info I can provided please let me know.
Comment 6 Stefan Dirsch 2017-08-10 09:51:17 UTC
Which desktop are you trying to run?
Comment 7 Robert Munteanu 2017-08-10 10:04:47 UTC
I usually ran Gnome through GDM - I suppose the same is valid for xdm.
Comment 8 Stefan Dirsch 2017-08-10 12:40:18 UTC
Yeah. I run GNOME. Works via xdm/sddm. But not via gdm.
Comment 9 Robert Munteanu 2017-08-10 13:45:06 UTC
I tried the following:

- uninstalled the driver ( nvidia-installer --uninstall )
- reinstalled libglvnd and Mesa packages
- installed the driver again

Still no luck
Comment 10 Robert Munteanu 2017-08-10 13:55:12 UTC
I also tried sddm - all I get is a black screen with a KDE-styled cursor.

sddm-greeter crashes with a segmentation fault. The backtrace is below, apparently there is a problem creating an openGL context, which would also resonate with Gnome not starting up due to missing openGL capabilities.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc7f7c171e0 in __lll_unlock_elision () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x7fc7fc735dc0 (LWP 5746))]
(gdb) bt
#0  0x00007fc7f7c171e0 in __lll_unlock_elision () from /lib64/libpthread.so.0
#1  0x00007fc7f400f492 in _XError () from /usr/lib64/libX11.so.6
#2  0x00007fc7f6e9ed7b in glXMakeCurrentReadSGI () from /usr/lib64/libGL.so.1
#3  0x00007fc7ed849785 in ?? () from /usr/lib64/qt5/plugins/xcbglintegrations/libqxcb-glx-integration.so
#4  0x00007fc7ed847967 in ?? () from /usr/lib64/qt5/plugins/xcbglintegrations/libqxcb-glx-integration.so
#5  0x00007fc7f1095591 in QXcbIntegration::createPlatformOpenGLContext(QOpenGLContext*) const ()
   from /usr/lib64/libQt5XcbQpa.so.5
#6  0x00007fc7fa8c308f in QOpenGLContext::create() () from /usr/lib64/libQt5Gui.so.5
#7  0x00007fc7ed84a2f7 in ?? () from /usr/lib64/qt5/plugins/xcbglintegrations/libqxcb-glx-integration.so
#8  0x00007fc7ed84a8f1 in ?? () from /usr/lib64/qt5/plugins/xcbglintegrations/libqxcb-glx-integration.so
#9  0x00007fc7fc0bb635 in QSGRenderLoop::instance() () from /usr/lib64/libQt5Quick.so.5
#10 0x00007fc7fc1232a9 in QQuickWindowPrivate::init(QQuickWindow*, QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#11 0x00007fc7fc1ac67d in QQuickView::QQuickView(QWindow*) () from /usr/lib64/libQt5Quick.so.5
#12 0x00005605af07f1d7 in SDDM::GreeterApp::addViewForScreen(QScreen*) ()
#13 0x00005605af080773 in SDDM::GreeterApp::GreeterApp(int&, char**) ()
#14 0x00005605af0641a9 in main ()
Comment 11 Stefan Dirsch 2017-08-10 16:19:26 UTC
Ok. I've made two changes.

1.
nvidia-gfxG04.changes
-------------------------------------------------------------------
Thu Aug 10 09:39:21 UTC 2017 - sndirsch@suse.com

- %triggerin: create /run/regenerate-initrd directory, if needed
  (boo#1053115)

2.
x11-video-nvidiaG04.changes
-------------------------------------------------------------------
Thu Aug 10 14:07:01 UTC 2017 - sndirsch@suse.com

- modprobe.d/50-nvidia.conf: add also /dev/nvidia-modeset, since 
  with Tumbleweed gdm and X are no longer running as root and
  therefore cannot create/access it (boo#1053115)

Change 1 was probably not needed. For some reason modules have been generated
into initrd nevertheless.

Change 2 fixes my gdm issue. This is needed with gdm no longer running as root
on Tumbleweed and we care to not have access to nvidia devices for everyone
(first line in /etc/modprobe.d/50-nvidia.conf). Which was the reason, why the installer works for everybody, but not the RPMs. When using the installer
the nvidia device files are read and writable for everyone.


I"m attaching the latest /etc/modprobe.d/50-nvidia.conf. Please have a try.
You need a reboot after replacing the existing one.

But I'm afraid it won't fix the sddm issue, I do not see on my system.
Comment 12 Stefan Dirsch 2017-08-10 16:20:38 UTC
Created attachment 736113 [details]
50-nvidia.conf
Comment 13 Kirill Kirillov 2017-08-10 17:07:40 UTC
(In reply to Stefan Dirsch from comment #12)
> I'm attaching the latest /etc/modprobe.d/50-nvidia.conf. Please have a try.
> You need a reboot after replacing the existing one.

gdm is working now after replacing the file. Well done!
Comment 14 Robert Munteanu 2017-08-10 21:15:24 UTC
Well, it definitely changed something for me :-)

I no longer get the frequent crashes/flickering due to gdm restarting. Now I get a black screen with gdm. Situation remains the same with xdm and sddm. I'll attach the nvidia-bug-report.log.gz file and journalctl -b output.

Something that stands out in the journal:

Aug 11 00:05:15 mars systemd[2170]: Started D-Bus User Message Bus.
Aug 11 00:05:16 mars gnome-session[2200]: libGL error: No matching fbConfigs or visuals found
Aug 11 00:05:16 mars gnome-session[2200]: libGL error: failed to load driver: swrast
Aug 11 00:05:16 mars dbus-daemon[2198]: Activating via systemd: service name='org.a11y.Bus' unit='at-spi-dbus-bus.service'
Aug 11 00:05:16 mars systemd[2170]: Starting Accessibility services bus...
Aug 11 00:05:16 mars dbus-daemon[2198]: Successfully activated service 'org.a11y.Bus'
Aug 11 00:05:16 mars systemd[2170]: Started Accessibility services bus.
Aug 11 00:05:16 mars at-spi-bus-launcher[2208]: Activating service name='org.a11y.atspi.Registry'
Aug 11 00:05:16 mars at-spi-bus-launcher[2208]: Successfully activated service 'org.a11y.atspi.Registry'
Aug 11 00:05:16 mars org.a11y.atspi.Registry[2213]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
Aug 11 00:05:16 mars gnome-session[2200]: libGL error: No matching fbConfigs or visuals found
Aug 11 00:05:16 mars gnome-session[2200]: libGL error: failed to load driver: swrast
Aug 11 00:05:16 mars gnome-session[2200]: X Error of failed request:  BadValue (integer parameter out of range for operation)
Aug 11 00:05:16 mars gnome-session[2200]:   Major opcode of failed request:  153 (GLX)
Aug 11 00:05:16 mars gnome-session[2200]:   Minor opcode of failed request:  3 (X_GLXCreateContext)
Aug 11 00:05:16 mars gnome-session[2200]:   Value in failed request:  0x0
Aug 11 00:05:16 mars gnome-session[2200]:   Serial number of failed request:  31
Aug 11 00:05:16 mars gnome-session[2200]:   Current serial number in output stream:  34
Aug 11 00:05:16 mars gnome-session[2200]: gnome-session-check-accelerated: GL Helper exited with code 256
Comment 15 Robert Munteanu 2017-08-10 21:17:20 UTC
Created attachment 736169 [details]
nvidia-bug-report gathered with the RPM installer after modifying 50-nvidia.conf
Comment 16 Robert Munteanu 2017-08-10 21:18:05 UTC
Created attachment 736170 [details]
journalctl -b output when using the RPMs with the modified 50-nvidia.conf file
Comment 17 Stefan Dirsch 2017-08-11 00:27:19 UTC
> libGL error: failed to load driver: swrast

Mesa kicks in here obviously. It should not.

It could be this issue (libGL.so.*). Unfortunately I no longer remember, why
this couldn't be fixed differently. It was a one-time workaround needed to be
done on TW manually when we've switched to libglvnd.

And yes, the issue does not occur with the NVIDIA installer, because it moves
such libraries out-of-the way. I guess it detects the system as not libglvnd-compatible.

So please try this. Remove /usr/lib64/libGL.so.* and /usr/lib/libGL.so.*. Then reinstall libglvnd and libglvnd-32bit.


Date: Thu, 10 Aug 2017 17:49:04 +0200
From: Stefan Dirsch <sndirsch@suse.de>
To: Konstantin Voinov <kv@kott.no-ip.biz>
Cc: opensuse-factory@opensuse.org
Subject: Re: [opensuse-factory] NVIDIA gfx driver RPMs available for Tumbleweed ...

On Fri, Aug 11, 2017 at 01:11:43AM +1000, Konstantin Voinov wrote:
> and one more little note
> zypper in --force libglvnd libglvn-32bit
> did not rewrite wrong *.so, I had to erase them manually before
> maybe it helps someone

Thanks for the hint. This issue sounds familiar to me. ;-)
Comment 18 Robert Munteanu 2017-08-11 06:08:33 UTC
(In reply to Stefan Dirsch from comment #17)
> > libGL error: failed to load driver: swrast
> 
> Mesa kicks in here obviously. It should not.
> 
> It could be this issue (libGL.so.*). Unfortunately I no longer remember, why
> this couldn't be fixed differently. It was a one-time workaround needed to be
> done on TW manually when we've switched to libglvnd.

That worked, thank you! Is this a bug to report somewhere else ( NVidia? ) or should I just update the 'NVidia the hard way wiki page' to warn about this problem? Feel free to resolved as fixed BTW.

(long description below)

There were indeed some libGL.so.* files not cleaned up after uninstalling the .run driver:

$ rpm -qf $(find /usr/lib64/ /usr/lib/ -name 'libGL.so.*')
file /usr/lib64/libGL.so.1.2 is not owned by any package
file /usr/lib64/libGL.so.1.2.0 is not owned by any package
libglvnd-0.1.2~20170620~d850cdd-1.2.x86_64
libglvnd-0.1.2~20170620~d850cdd-1.2.x86_64
file /usr/lib/libGL.so.1.2 is not owned by any package
file /usr/lib/libGL.so.1.2.0 is not owned by any package
libglvnd-32bit-0.1.2~20170620~d850cdd-1.2.x86_64
libglvnd-32bit-0.1.2~20170620~d850cdd-1.2.x86_64

So I've removed them and reinstalled libglvnd

$ sudo rm /usr/lib64/libGL.so.1.2 /usr/lib64/libGL.so.1.2.0 /usr/lib/libGL.so.1.2 /usr/lib/libGL.so.1.2.0
$ sudo zypper in -f libglvnd libglvnd-32bit

Now gdm/sddm work and I can also log in to Gnome.
Comment 19 Stefan Dirsch 2017-08-11 07:44:02 UTC
Thanks! Now I remember what happened. You once have updated TW from Mesa to Mesa+libglvnd. At the same time the nvidia driver was installed manually. With uninstalling it, NVIDIA's installer restored these libGL.so.1.2 files, which now no longer are part of Mesa. But they are preferred over the ones from libglvnd (libGL.so.1.0). So they need to be removed manually. Unfortunately. If you could add this information to 'NVidia the hard way wiki page', this would be great!
Comment 20 Robert Munteanu 2017-08-11 08:36:43 UTC
(In reply to Stefan Dirsch from comment #19)
> Thanks! Now I remember what happened. You once have updated TW from Mesa to
> Mesa+libglvnd. At the same time the nvidia driver was installed manually.
> With uninstalling it, NVIDIA's installer restored these libGL.so.1.2 files,
> which now no longer are part of Mesa. But they are preferred over the ones
> from libglvnd (libGL.so.1.0). So they need to be removed manually.
> Unfortunately. If you could add this information to 'NVidia the hard way
> wiki page', this would be great!

The information was mostly there already, added a reference to this bug and the driver packages as an RPM.

https://en.opensuse.org/index.php?title=SDB%3ANVIDIA_the_hard_way&type=revision&diff=120938&oldid=119721
Comment 21 Stefan Dirsch 2017-08-11 09:03:10 UTC
(In reply to Robert Munteanu from comment #20)
> (In reply to Stefan Dirsch from comment #19)
> > Thanks! Now I remember what happened. You once have updated TW from Mesa to
> > Mesa+libglvnd. At the same time the nvidia driver was installed manually.
> > With uninstalling it, NVIDIA's installer restored these libGL.so.1.2 files,
> > which now no longer are part of Mesa. But they are preferred over the ones
> > from libglvnd (libGL.so.1.0). So they need to be removed manually.
> > Unfortunately. If you could add this information to 'NVidia the hard way
> > wiki page', this would be great!
> 
> The information was mostly there already, added a reference to this bug and
> the driver packages as an RPM.
> 
> https://en.opensuse.org/index.
> php?title=SDB%3ANVIDIA_the_hard_way&type=revision&diff=120938&oldid=119721

Indeed. Thanks!
Comment 22 Itxaka serrano 2017-08-12 22:17:02 UTC
I just hit this issue and can confirm that using the attached 50-nvidia.conf file fixes it. Thanks Stefan!
Comment 23 Stefan Dirsch 2017-08-17 07:46:29 UTC
Fixed RPMs have already landed in NVIDIA's repository. Closing.