Bugzilla – Bug 1003402
Plasmashell dumps core with nouveau graphics
Last modified: 2016-10-24 18:01:22 UTC
Using the open source (nouveau) drivers with a Nvidia Geforce 6150LE card. This works fine with Gnome or XFCE. But if I login to Plasma 5, the desktop never fully comes up. If I kill the session (CTRL-ALT-BACKSPACE), and login with a different desktop, I can see several processes taking core dumps for plasma shell. I have rebooted with "nomodeset", and KDE comes up fine that way. After configuring the desktop that way, I went back to trying the default nouveau drivers again. And, once again, it dumps core.
The same on GTX 660 + (nouveau) open nvidia driver on Plasma 5.x Freeze of desktop. After installing closed nvidia driver runs OK.
Same for me with a desktop computer with "nVidia GM206 [GeForce GTX 960]": After installation of any Leap 42.2 beta 1/2/3 I get a black screen after booting into a plasma/KDE session. The only thing which works is launching the 'task manager' window with Ctrl-Esc. This sometimes shows a STOPPED plasmashell process (i.e. it is intercepted by DrKonqi while crashing (with no DrKonqi window popping up though)). Also the threads for the three beta releases of Leap 42.2 on forums.opensuse.org contain user reports for this issue. There seem to be problems with nouveau in general and the following pages indicate that there is at least some progress: https://nwrickert2.wordpress.com/2016/07/21/opensuse-leap-42-2-alpha3/ https://nwrickert2.wordpress.com/2016/09/02/opensuse-leap-42-2-beta1/ https://nwrickert2.wordpress.com/2016/09/25/opensuse-leap-42-2-beta2/ Still there seems to be a general problem of plasmashell with noveau, at least with certain NVidia graphic cards. Inspectiong my /var/log/Xorg.0.log on a beta installation shows the following error (EE) Unknown chipset: NV126 which according to https://bugs.freedesktop.org/show_bug.cgi?id=94728#c2 may mean that kernel 4.6 and mesa 11.2 is required. And indeed, when running a tumbleweed live CD (which currently comes with kernel 4.8 and mesa 12.0) I do *not* have crashes of plasmashell. Does this mean that the upcoming Leap 42.2 release will not work with out of the box nouveau/plasmashell with newer NVidia graphics cards?
So is this a nouveau bug or plasma not handling errors correctly?
From my uneducated guess, looks like a noveau bug. I need to check if disabling openGL with QtQuick is possible in Qt 5.6 and if the crashes stop when doing that.
Try this: export LIBGL_ALWAYS_SOFTWARE=1 && plasmashell And see if the crash persists.
Isn't that a duplicate of the bug Max addressed a while ago by adding the lock/unlock patches to Mesa's nouveau driver, so multithreaded KDE apps may work again? Max?
(In reply to Luca Beltrame from comment #5) > Try this: > export LIBGL_ALWAYS_SOFTWARE=1 && plasmashell I cannot see how this would be possible if my KDE session is just a black screen with no way to open a konsole because the UI shell has crashed. Even Alt-F2 nowadays is not possible w/o plasma. Is there an alternative to adding the export to .bashrc?
(In reply to Stefan Quandt from comment #7) > (In reply to Luca Beltrame from comment #5) > > Try this: > > export LIBGL_ALWAYS_SOFTWARE=1 && plasmashell > I cannot see how this would be possible if my KDE session is just a black You can switch to a VT, log in and issue DISPLAY=:0 konsole & to get a terminal.
>export LIBGL_ALWAYS_SOFTWARE=1 && plasmashell Wow! That makes all the difference. To be clear, I added the environment setting to my shell startup file. I'm a "csh" user, so I added: setenv LIBGL_ALWAYS_SOFTWARE 1 And now KDE/Plasma 5 starts up without any problems. I'll note that I already had: --- OpenGLIsUnsafe=true --- in the [Compositing] section of ".config/kwinrc", but that did not seem to be enough to resolve the problem. I've configured compositing to use XRender, and that seems to be working.
Host mcp61 TW #1: # zypper se -si | egrep 'nouveau|x11-serv' i | libXvMC_nouveau | package | 12.0.3-143.1 | i586 | OSS i | libdrm_nouveau2 | package | 2.4.71-1.1 | i586 | OSS i | libvdpau_nouveau | package | 12.0.3-143.1 | i586 | OSS i | xorg-x11-server | package | 7.6_1.18.4-1.2 | i586 | OSS # cat /proc/cmdline root=LABEL=st160os133 ipv6.disable=1 net.ifnames=0 splash=0 noresume vga=791 video=1024x768@60 video=1440x900@60 nouveau.noaccel=1 3 # lspci | grep VGA 00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2) Plasmashell dumps core and locks up UI if I leave nouveau.noaccel=1 off cmdline, but Plasma from startx in multi-user.target seems perfectly fine with it and Xorg's modeset(0) driver. Same host also has freshly updated x86_64 42.2. With nouveau.noaccel=1 on cmdline and starting from either startx in multi-user, or from KDM, and using Oxygen wherever possible instead of Breeze, whether using nouveau or modeset driver, plasmashell is not dumping core (to /var/lib/systemd/coredump), but kactivitymanage is.
Concerning Comment 6, could you please try the Mesa package from the following OBS project? https://build.opensuse.org/package/show/home:mstaudt:branches:openSUSE:Leap:42.2/Mesa (it's currently under review to go into Leap 42.2) You should only need to update the main Mesa package, not the subpackages. Please let me know whether this fixes your issue.
The bug that Stefan referenced in Comment 6 is boo#997171.
I've commented out the patches by default now. You'll have to change use_broken_nouveau_locking_patches to 1 at the beginning of Mesa.spec, rebuild Mesa and then also install the new package Mesa-dri-nouveau.
what's the effect on the other desktops when not installing nouveau anymore?
fallsback to Mesa's software rasterizer (swrast_dri.so).
Will that have performance impact? All beta testing so far was done with nouveau installed. So I'm a bit reluctant to rip it out completely because of KDE. Maybe a solution within KDE could be found to avoid the offending calls with nouveau.
(In reply to Ludwig Nussel from comment #16) > reluctant to rip it out completely because of KDE. Maybe a solution within > KDE could be found to avoid the offending calls with nouveau. The problem is in Qt, which uses openGL for QtQuick / QML, which is used for all of Plasma. For Qt 5.7, a new 2D renderer was added which may bypass this problem, however it is not available for 5.6.x that Leap ships. KDE has expressed the intent to avoid blacklists / whitelists because they break surprisingly often. Possible alternatives: - inject LIBGL_ALWAYS_SOFTWARE=1 in /usr/bin/startkde if we're running on noveau (requires patching upstream startkde) - inject XRender rendering for KWin if we detect we're running noveau (needs either patching startkde, or something that starts before plasmashell is starting up). Both solutions require some patching and I have no idea how to tell if we're under noveau or not. I'd prefer the second solution because the first will switch everything to software rendering and it's possibly not desired.
(In reply to Luca Beltrame from comment #17) > Both solutions require some patching and I have no idea how to tell if we're > under noveau or not. I guess you can use the "OpenGL renderer string" information, which is also printed in glxinfo. I need to figure out, what nouveau and nvidia(proprietary) gives us here ...
Ah. There is a "OpenGL vendor string" even, which is "nouveau" running nouveau DRI driver.
(In reply to Luca Beltrame from comment #17) > Possible alternatives: > > - inject LIBGL_ALWAYS_SOFTWARE=1 in /usr/bin/startkde if we're running on > noveau (requires patching upstream startkde) > - inject XRender rendering for KWin if we detect we're running noveau (needs > either patching startkde, or something that starts before plasmashell is > starting up). Or #3: patch kwin to use XRender by default on nouveau. That's the best solution IMHO, because kwin already has the information and some code that decides whether to use OpenGL or XRender depending on the graphics driver.
With nvidia proprietary drive active "OpenGL vendor string" is "NVIDIA corporation".
FTR, the mentioned code is in libkwineffects/kwinglplatform.cpp, in the function GLPlatform::detect(): (line#826 ff.) https://quickgit.kde.org/?p=kwin.git&a=blob&h=49758dd732f96b9e9e81911bd7d433f4f3f64c25&hb=b9efeb6c289c8a2b8f7244428d2a8a08c76d4b3b&f=libkwineffects%2Fkwinglplatform.cpp Setting m_recommendedCompositor = XRenderCompositing in the case of nouveau should force XRender. I suppose I can come up with a patch, but I won't be able to test it as I don't have an nvidia card. Alternatively (regarding solution 2), setting the environment variable KWIN_COMPOSE="X" in startkde when using nouveau should work as well and is probably better than modifying kwinrc. See https://community.kde.org/KWin/Environment_Variables#KWIN_COMPOSE I'm not sure if that will override kwinrc though.
Can we have a backtrace for this crash? I occurred to me that we don't have one...
I'm trying to get a test system to test some patches I've done to workaround this issue in qtwebengine and kwin5 (not needing any envvar), but if somebody else already has a Leap 42.2 system with nouveau drivers and wants to help testing, please contact me on irc (nick antlarr).
Without a backtrace we won't be able to say what's wrong. But the title of this bug report says "Plasmashell dumps core" which means no change to KWin will fix that problem. Sorry, if you look at KWin you look at the wrong place.
(In reply to Martin Gräßlin from comment #25) > Without a backtrace we won't be able to say what's wrong. But the title of > this bug report says "Plasmashell dumps core" which means no change to KWin > will fix that problem. Sorry, if you look at KWin you look at the wrong > place. But comment#9 mentions that switching to XRender compositing seems to be working. Of course that would only be a workaround, but better than Plasma crashing.
> But comment#9 mentions that switching to XRender compositing seems to be working. No comment #9 doesn't say that. Comment #9 says that exporting LIBGL_ALWAYS_SOFTWARE=1 does the trick. Which makes sense because that switches the OpenGL driver from nouveau to llvmpipe and thus affects all OpenGL applications. Unlike the kwin compose setting which only adjusts kwin.
(In reply to Martin Gräßlin from comment #27) > > But comment#9 mentions that switching to XRender compositing seems to be working. > > No comment #9 doesn't say that. It does, in the last sentence: (In reply to Neil Rickert from comment #9) > I've configured compositing to use XRender, and that seems to be working.
I seem to have confused people with comment #9. Sorry about that. All I meant was that desktop effects are still working. In any case, here is the situation with 42.2 RC1. It is working well, as long as I use LIBGL_ALWAYS_SOFTWARE=1 in the environment (set from a shell startup file). While it was running well, I switched to XRender. Maybe that doesn't actually do anything with that environment setting. I also edited "kwinrc" to set OpenGLIsUnsafe=true After that, I logged out. Then I logged in without the environment setting. And plasmashell crashed. So it is "LIBGL_ALWAYS_SOFTWARE=1" that works for me. Nothing else is helping. I see a request for a backtrace. In the past, Kcrash has provided those. But in this case it is not showing up. Maybe I can get it with "gdb" and the coredump, but some hints would help. Looking at "/var/lib/systemd/coredump", I see: # ls -tr core.plymouthd.0.7f466745a5054929af5b6c409d4924f3.320.1476781929000000.xz core.plymouthd.0.52725b8459f64a4083eb94d2f4a34514.320.1476803614000000.xz core.kactivitymanage.1001.52725b8459f64a4083eb94d2f4a34514.2852.1476804511000000.xz core.kactivitymanage.1001.52725b8459f64a4083eb94d2f4a34514.3702.1476804800000000.xz core.kactivitymanage.1001.52725b8459f64a4083eb94d2f4a34514.4527.1476804890000000.xz core.plasmashell.1001.52725b8459f64a4083eb94d2f4a34514.4733.1476804838000000.xz core.kactivitymanage.1001.52725b8459f64a4083eb94d2f4a34514.5185.1476805293000000.xz core.kactivitymanage.1001.52725b8459f64a4083eb94d2f4a34514.5948.1476807629000000.xz core.gvfsd-network.1001.52725b8459f64a4083eb94d2f4a34514.9269.1476808299000000.xz core.gvfsd-network.1001.52725b8459f64a4083eb94d2f4a34514.9276.1476808299000000.xz core.plymouthd.0.d863f0d432dd4e13b51036444908b879.344.1476850955000000.xz core.org_kde_powerde.1001.d863f0d432dd4e13b51036444908b879.2787.1476850985000000.xz core.plymouthd.0.6ab3fc1618044cb6926bfa3b113f391e.329.1476881201000000.xz It seems that plasmashell is not the only thing crashing.
Thanks for clarifying. So to summarize: changing KWin settings won't change anything, what matters is LIBGL_ALWAYS_SOFTWARE=1 @openSUSE devs: if you want to set this be careful. We have here situations that for some nouveau users OpenGL works, but for some it doesn't. Setting that environment variable for all nouveau users will significantly destroy the experience for them. It replaces a working OpenGL driver (GPU acceleration) with CPU emulation. Especially older hardware is well supported with nouveau and for that it would result in CPU emulation on a CPU which is not capable to do that job. So be extremely careful. Maybe it is better to work with a .driconfig and only change affected hardware combo.
(In reply to Neil Rickert from comment #29) > So it is "LIBGL_ALWAYS_SOFTWARE=1" that works for me. Nothing else is > helping. Ok, thanks for the clarification. In that case we can forget about forcing XRender obviously... And I agree that setting LIBGL_ALWAYS_SOFTWARE=1 on nouveau in general is probably not the best idea either. > I see a request for a backtrace. In the past, Kcrash has provided those. > But in this case it is not showing up. Maybe I can get it with "gdb" and > the coredump, but some hints would help. You should be able to get a backtrace by running "gdb plasmashell" (even if Plasma crashed, Alt+F2 should still work to open krunner so you should be able to run Konsole or other applications), type "run", wait until it crashes, and then enter "bt". Or you should be able to get a backtrace from the existing coredumps with systemd-coredumpctl. I'm not sure about the correct syntax though. > It seems that plasmashell is not the only thing crashing. Well, the kactivitymanagerd crash (which happens on exit/shutdown) is well known, and actually a bug in Qt. We have a (upstream) patch for Qt that fixes it, but that's not in 42.2 RC1 yet. It's harmless though. For the rest, well, hard to say without further information. But out of scope for this bug report anyway I'd say.
I managed to test this issue on a NVIDIA G98 [Quadro NVS 295] with Nouveau drivers on Leap 42.2 (Plasma 5.8.1). Martin, I'm afraid kwin seems to have a problem here. Probably not the only one, but a default Leap 42.2 installation simply didn't start. I logged in, got the splash screen animated circle and everything seemed to freeze afterwards. Once I got just the opensuse bubble background (and a mouse I could move), and another time it freezed while the animated circle was still there (so the animation freezed too, just to be clear). I tried to start konsole and xterm from an ssh session setting the DISPLAY & co. env vars but they didn't do anything. I then tried to start using an xfce session (which worked fine), then started konsole (which worked fine) and then tried to run kwin_x11 --replace . It freezed the whole desktop (no window was getting key/click events, although the mouse was moving as before). I logged via ssh, killed kwin_x11 and the desktop was reponsive again. Then I installed my patched kwin version, ran kwin_x11 --replace again and it worked fine. My patched kwin version just includes this patch: https://build.opensuse.org/package/view_file/home:alarrosa:branches:KDE:Frameworks5:LTS/kwin5/use-xrender-with-nouveau-boo-1003402.diff?expand=1 I understand that this slows down performance on nouveau for cards that work fine, but how can we recognize those? Note that compared to disabling Nouveau on Leap 42.2 (which seems to be the current alternative, see https://bugzilla.suse.com/show_bug.cgi?id=1005323), I think this solution, while not perfect, is much better.
Btw, after installing this patched kwin version, I could log into a plasma5 session without problems. So far nothing crashed or freezed (not even kmail or kontact, which I was expecting to crash according to https://bugzilla.suse.com/show_bug.cgi?id=1005323#c8)
Created attachment 698178 [details] transcript of gdb session This is my attempt to get trace info from the existing coredump (a "script" transcript file from a terminal session). It might not be useful. It seems that the dump is truncated.
I definitely vote for switching to Xrender for nouveau driver users by default. Is it still possible to change this setting via user interface still? Is this code change just changing the default?
@Antonia: debug output of KWin please and a backtrace of the hung process. Let's fix the problem and not work around it. > I definitely vote for switching to Xrender for nouveau driver users by default. Is it still possible to change this setting via user interface still? Is this code change just changing the default? No the settings won't override it. If we get to that point, we have better ways to fix it. It means we get the OpenGL context up and running. It is much better to fix the issue then. Overall as you can see: I want to see this properly investigated and understood and then look at the possible solutions. Let's not shoot wildly around in the hope that we hit it, when we haven't understood the problem yet (which we don't as we don't have a backtrace).
(In reply to Martin Gräßlin from comment #36) > @Antonia: Antonio, please :) > debug output of KWin please and a backtrace of the hung process. > Let's fix the problem and not work around it. > Sure, but while it's fixed or not, I wanted to have a workaround. > > I definitely vote for switching to Xrender for nouveau driver users by default. Is it still possible to change this setting via user interface still? Is this code change just changing the default? > > No the settings won't override it. If we get to that point, we have better > ways to fix it. It means we get the OpenGL context up and running. It is > much better to fix the issue then. > > Overall as you can see: I want to see this properly investigated and > understood and then look at the possible solutions. Cool, let's do that then.
Created attachment 698257 [details] kwin_x11 backtrace while frozen
Created attachment 698258 [details] kwin_x11 output
#3 0x00007f72eb2c8d1f in wait_for_reply (c=c@entry=0x1494440, request=request@entry=1319, e=e@entry=0x7ffd661067e8) at xcb_in.c:516 #4 0x00007f72eb2c8e92 in xcb_wait_for_reply64 (c=0x1494440, request=1319, e=0x7ffd661067e8) at xcb_in.c:560 #5 0x00007f72e2dc5030 in _XReply () from /usr/lib64/libX11.so.6 #6 0x00007f72e1b071f3 in DRI2GetBuffersWithFormat (dpy=0x1493160, drawable=29360153, width=width@entry=0x194a8b8, height=height@entry=0x194a8bc, attachments=0x7ffd661069b0, count=1, outCount=outCount@entry=0x7ffd66106990) at dri2.c:491 #7 0x00007f72e1b074fb in dri2GetBuffersWithFormat (driDrawable=<optimized out>, width=0x194a8b8, height=0x194a8bc, attachments=<optimized out>, count=<optimized out>, out_count=0x7ffd66106990, loaderPrivate=0x194a940) at dri2_glx.c:900 #8 0x00007f72c63d34ad in dri2_drawable_get_buffers (count=<synthetic pointer>, atts=0x1917120, drawable=0x1956fc0) at dri2.c:213 #9 dri2_allocate_textures (ctx=0x19401a0, drawable=0x1956fc0, statts=0x1917120, statts_count=<optimized out>) at dri2.c:407 #10 0x00007f72c63cfc98 in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x1917120, count=1, out=0x7ffd66106ae0) at dri_drawable.c:83 Mesa is waiting for a buffer and blocks KWin. That is ouch. Not much we can do. There is a big chance that not only kwin is affected by that and switching to xrender won't fix that problem. Suggestion: try whether LIBGL_ALWAYS_SOFTWARE=1 works in that case. If yes use a driconfig to narrow it down.
Created attachment 698264 [details] kwin_x11 output with LIBGL_ALWAYS_SOFTWARE When using LIBGL_ALWAYS_SOFTWARE=1, kwin works fine. I don't know much about driconfig, so let's see if I find some documentation or someone who knows more about it. Thanks Martin.
This is really a dupe of boo#997171. Discussiong solution in boo#1005323. *** This bug has been marked as a duplicate of bug 997171 ***
This is an autogenerated message for OBS integration: This bug (1003402) was mentioned in https://build.opensuse.org/request/show/437189 Factory / kwin5 https://build.opensuse.org/request/show/437190 42.2 / kwin5