|
Bugzilla – Full Text Bug Listing |
| Summary: | Intel Kabylake: Fallback from Wayland to Xorg fails | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Vladimir FROMENT <tutux84> |
| Component: | X.Org | Assignee: | E-mail List <xorg-maintainer-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <xorg-maintainer-bugs> |
| Severity: | Normal | ||
| Priority: | P3 - Medium | CC: | msrb, mstaudt, sndirsch, tiwai, tutux84 |
| Version: | Leap 15.0 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Journalctl logs
180426-KOTD boot failure journalctl logs |
||
Thank you for your bug report! Comments below: (In reply to Vladimir FROMENT from comment #0) > In /etc/gdm/custom.conf, if WaylandEnable is set to false, after rebooting > and typing login&password into the Gnome login screen, the system hang. I > suspect a kernel panic but I don't know what log I can provide to prove it > since it seems impossible to switch to a console TTY : ctrl+alt+F1 to F12 > doesn't respond once the system is frozen. Anything to be found in the system journal? Maybe the system had time to write an error message to disk - please use journalctl to have a look. On the other hand, maybe the X server hung, but the system itself is still alive, and VT switching is dead because X and GDM (or maybe earlier: Plymouth and GDM) are fighting for VT_SETMODE and produce a deadlock in the VT subsystem. Been there before. > For your info, I need to fallback to Xorg because Wayland doesn't detect my > external screen (I haven't had time to look for troubleshoot tips yet but I > may open a bug report in a few days). Sigh. That's something for the desktop team, I guess. > CPU: Quad core Intel Core i7-7700HQ (-HT-MCP-) cache: 6144 KB Kaby Lake - that's pretty darn new. I suspect all display outputs are connected to the Intel GPU. Once the Nvidia card is blocked, things should "just work". > Graphics: Card-1: Intel Device 591b > Card-2: NVIDIA GP106M [GeForce GTX 1060 Mobile] Whoops. By default, the nouveau kernel driver is in use, which is known to do funky things with some cards. Can you please blacklist the nouveau kernel module, rebuild the initramfs (by calling mkinitrd) and then reboot? Maybe that'll fix it... You can use lsinird to check that nouveau.ko is not contained in the resulting initramfs. Thanks! Ok. Please attach /var/log/gdm/greeter.log and /home/<user>/.local/share/xorg/Xorg.1.log first. You may end up disabling one of your two GPUs in order to get rid of these issues though. Created attachment 767737 [details]
Journalctl logs
A bit of overview about the timestamps in these journalctl logs:
vlad@linux-5udt:~> grep -i "\-\- Reboot" -B1 Documents/journalctl-Xorg-fallback.txt
avril 19 17:40:40 linux-5udt systemd-journald[423]: Journal stopped
-- Reboot -- ## This reboot happens after disabling Wayland in /etc/gdm/custom.conf
--
avril 19 17:41:48 linux-5udt systemd[1]: Startup finished in 5.623s (kernel) + 5.200s (initrd) + 47.118s (userspace) = 1min 5.494s.
-- Reboot -- ## At about 17:41 I started typing login/pass in GDM, I waited until 17:44 before hard rebooting due to frozen state. So the interesting traces are right before this point
--
avril 19 17:46:39 linux-5udt systemd-journald[461]: Journal stopped
-- Reboot -- ## Right after re-enabling Wayland
(In reply to Stefan Dirsch from comment #2) > Ok. Please attach /var/log/gdm/greeter.log and > /home/<user>/.local/share/xorg/Xorg.1.log first. You may end up disabling > one of your two GPUs in order to get rid of these issues though. /var/log/gdm is empty on my system. And there is no ~/.local/share/xorg folder in my case. I will try to disable nouveau module right now. It appears that nouveau was already blacklisted, I forgot I had installed bumblebee some time ago... So I did various tests to make sure that the Xorg fallback failure wasn't related to bumblebee. It isn't, it keeps failing. I tried also with nouveau enabled again but it didn't change anything. FYI I did ran the initrd command after enabling it. Now the system is in the situation where nouveau is blacklisted again and bumblebee is removed (I reinstalled bbswitch though). Hmm. Nothing obvious in the X logfile I found in the journalctl. Indeed nouveau is disabled. It's an Intel Kabylake GPU. No idea why it freezes. Another option would be to disable Intel graphics (in Firmware) - if possible and then run NVIDIA's proprietary driver. But I'm not sure, whether the hardware supports this (for all needed outputs). (In reply to Stefan Dirsch from comment #7) > Another option would be to disable Intel graphics (in Firmware) - if > possible and then run NVIDIA's proprietary driver. But I'm not sure, whether > the hardware supports this (for all needed outputs). Do you mean disabling the Intel GPU in the BIOS ? It is not possible with this laptop. Eventually, by following [1] and [2], I could fix the fallback issue by setting the kernel parameter "i915.enable_guc=1". This option apparently enable advanced drivers for recent Intel chipsets. Following [2] advices, I also added enable_rc6=1, enable_fbc=1, enable_psr=1, disable_power_well=0 and semaphores=1. That seems not to have introduced any regression in my use cases. Either under Wayland and Xorg. So the bug report can be considered fixed from my point of view (although my external screen is still not detected, which is odd because Ubuntu 17.10 does it, but that's another story). Unless you need more info/logs from me ? [1] https://wiki.archlinux.org/index.php/intel_graphics [2] https://gist.github.com/Brainiarc7/aa43570f512906e882ad6cdd835efe57 (In reply to Vladimir FROMENT from comment #8) > (In reply to Stefan Dirsch from comment #7) > > Another option would be to disable Intel graphics (in Firmware) - if > > possible and then run NVIDIA's proprietary driver. But I'm not sure, whether > > the hardware supports this (for all needed outputs). > > Do you mean disabling the Intel GPU in the BIOS ? It is not possible with > this laptop. That's why I wrote *if possible*. ;-) Obviously, this is not an option on your system then. > Eventually, by following [1] and [2], I could fix the fallback issue by > setting the kernel parameter "i915.enable_guc=1". This option apparently > enable advanced drivers for recent Intel chipsets. Following [2] advices, I > also added enable_rc6=1, enable_fbc=1, enable_psr=1, disable_power_well=0 > and semaphores=1. That seems not to have introduced any regression in my use > cases. Either under Wayland and Xorg. > > So the bug report can be considered fixed from my point of view (although my > external screen is still not detected, which is odd because Ubuntu 17.10 > does it, but that's another story). Unless you need more info/logs from me ? > > [1] https://wiki.archlinux.org/index.php/intel_graphics > [2] https://gist.github.com/Brainiarc7/aa43570f512906e882ad6cdd835efe57 Well, I would call this a workaround, not a fix. Seems option "i915.enable_guc=1" is enough to fix the issue for you, right? Yes, this sole option was enough. The others I mentioned were added after validating that my fallback issue was solved. Thanks! Takashi, FYI. Seems like some Kaby Lake chips have funky firmware loading. Do you know about this? Yes, i915 driver loads a few different kind of firmware files (DMC, GuC and HuC in addition to CSR, VBT and GVT-d stuff). Here the firmware in question is the second one, GuC, and I thought this should have been loaded / enabled automatically for CFL. Currently GuC for CFL is identical as for KBL. What shows /sys/module/i915/parameters/enable_guc if you don't pass the value -1? After loading the driver, it'll be set to either 0, 1 or 2. > Graphics: Card-1: Intel Device 591b
Takashi, sure this is CFL (Coffelake)?
#define INTEL_KBL_GT2_IDS(info) \
[...]
INTEL_VGA_DEVICE(0x591B, info), /* Halo GT2 */ \
Coffeelake has different IDs (0x3E??) according to current linux/drm/i915_pciids.h.
(In reply to Takashi Iwai from comment #13) > What shows /sys/module/i915/parameters/enable_guc if you don't pass the > value -1? After loading the driver, it'll be set to either 0, 1 or 2. According to https://wiki.archlinux.org/index.php/intel_graphics#Enable_GuC_.2F_HuC_firmware_loading this came with Kernel 4.16. (In reply to Stefan Dirsch from comment #15) > (In reply to Takashi Iwai from comment #13) > > What shows /sys/module/i915/parameters/enable_guc if you don't pass the > > value -1? After loading the driver, it'll be set to either 0, 1 or 2. > > According to > > https://wiki.archlinux.org/index.php/intel_graphics#Enable_GuC_. > 2F_HuC_firmware_loading > > this came with Kernel 4.16. But maybe it's already in sle15/Leap 15 with our backports. (In reply to Stefan Dirsch from comment #14) > > Graphics: Card-1: Intel Device 591b > > Takashi, sure this is CFL (Coffelake)? Sorry, I was confused. The chip in question is Kaby Lake (KBL). But the question still stands. Both KBL and CFL use the same firmware, and guc loading should have been enabled without the extra option. (In reply to Stefan Dirsch from comment #16) > (In reply to Stefan Dirsch from comment #15) > > (In reply to Takashi Iwai from comment #13) > > > What shows /sys/module/i915/parameters/enable_guc if you don't pass the > > > value -1? After loading the driver, it'll be set to either 0, 1 or 2. > > > > According to > > > > https://wiki.archlinux.org/index.php/intel_graphics#Enable_GuC_. > > 2F_HuC_firmware_loading > > > > this came with Kernel 4.16. > > But maybe it's already in sle15/Leap 15 with our backports. Yes. SLE15 / openSUSE Leap 15.0 kernel already got tons of backports and i915 driver is almost equivalent with 4.16. (In reply to Takashi Iwai from comment #13) > What shows /sys/module/i915/parameters/enable_guc if you don't pass the > value -1? After loading the driver, it'll be set to either 0, 1 or 2. For this please test without option "i915.enable_guc=1" (and all the other options). If needed, i.e. you're using an /etc/modprobe.d file snippet, recreate initrd afterwards via mkinitrd (In reply to Stefan Dirsch from comment #19) > (In reply to Takashi Iwai from comment #13) > > What shows /sys/module/i915/parameters/enable_guc if you don't pass the > > value -1? After loading the driver, it'll be set to either 0, 1 or 2. > > For this please test without option > > "i915.enable_guc=1" > > (and all the other options). If needed, i.e. you're using an /etc/modprobe.d > file snippet, recreate initrd afterwards via > > mkinitrd So after disabling all above-mentionned options in Yast > Bootloader and rebooting, the value of /sys/module/i915/parameters/enable_guc is 0. Thanks. I checked the recent code, and indeed the default value is zero. It was changed from -1 to 0 some time ago due to the latency issues and S4 resume problem, according to the git log. If enable_guc=1 option alone really helps, I believe it's worth to report to upstream devs. It'd be great if you can double-check it. (In reply to Takashi Iwai from comment #21) > Thanks. I checked the recent code, and indeed the default value is zero. > It was changed from -1 to 0 some time ago due to the latency issues and S4 > resume problem, according to the git log. Ok. Interesting. > If enable_guc=1 option alone really helps, I believe it's worth to report to > upstream devs. Which is supposed to be done by us or the reporter? > It'd be great if you can double-check it. That's what the reporter did before (comment #10). So should he really *double* check literally? (In reply to Stefan Dirsch from comment #22) > (In reply to Takashi Iwai from comment #21) > > Thanks. I checked the recent code, and indeed the default value is zero. > > It was changed from -1 to 0 some time ago due to the latency issues and S4 > > resume problem, according to the git log. > > Ok. Interesting. > > > If enable_guc=1 option alone really helps, I believe it's worth to report to > > upstream devs. > > Which is supposed to be done by us or the reporter? At best someone who own the hardware and can test, so the reporter would be the best option. Most likely the upstream devs will ask testing the latest development version or some patch, so we should be in Cc, of course. > > It'd be great if you can double-check it. > > That's what the reporter did before (comment #10). So should he really > *double* check literally? Yes, we need to test with the latest upstream version before reporting to upstream, at least. 4.17-rc kernel is found in OBS Kernel:HEAD repo, and 4.16.x is in OBS Kernel:stable repo. I *guess* the problem remains, but if these version work, there is another hope for a quicker fix. Thanks. Vladimir, could you please test our KOTD? (currently 4.17-rc) https://en.opensuse.org/openSUSE:Kernel_of_the_day (In reply to Stefan Dirsch from comment #24) > Thanks. Vladimir, could you please test our KOTD? (currently 4.17-rc) > > https://en.opensuse.org/openSUSE:Kernel_of_the_day Installed KOTD with this command: rpm -i --force http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/kernel-default-4.17.rc2-2.1.g0fad7ab.x86_64.rpm But the system fails to boot correctly. I get an error message at boot time saying "[FAILED] Failed to start Load Kernel Modules". The system doesn't get to load gdm and end up in maintenance mode. The journalctl logs will be attached right away. It seems related to the encrypted /home. I can reinstall Leap Beta in last version with an unencrypted /home but that would not be representing my normal setup. On another hand, prior to installing KOTD, I upgraded my Leap Beta via "zypper dup" and the workaround doesn't work anymore, even with i915.enable_guc=1. Wayland load but fallback to Xorg doesn't (same symptoms as before). It was beta 206.1 before upgrade. Let me know what you would need from me to move forward. I should have some time this weekend to test multiple setups if needed. Created attachment 768443 [details]
180426-KOTD boot failure journalctl logs
OMG. :-( Ummm... avril 26 19:06:32 linux-5udt systemd-cryptsetup[605]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/62d95b82-3e11-4713-85c2-8f7a9bd8b1d4. avril 26 19:06:34 linux-5udt kernel: device-mapper: table: 254:0: crypt: unknown target type avril 26 19:06:34 linux-5udt kernel: device-mapper: ioctl: error adding target to table avril 26 19:06:34 linux-5udt systemd-cryptsetup[605]: Failed to activate: Input/output error avril 26 19:06:34 linux-5udt systemd[1]: systemd-cryptsetup@cr_sda5.service: Main process exited, code=exited, status=1/FAILURE avril 26 19:06:34 linux-5udt systemd[1]: Failed to start Cryptography Setup for cr_sda5. avril 26 19:06:34 linux-5udt systemd[1]: Dependency failed for Encrypted Volumes. Sounds like your system fails to unlock your encrypted home. Also, there are no messages regarding the i915 driver in your log, so it seems that the KMS graphics driver isn't even loaded. Looks like a kernel or base system bug to me. Totally unrelated to your graphics problems. Hope you installed the KOTD in addition to the existing one and can still boot the old one? (In reply to Max Staudt from comment #28) > Looks like a kernel or base system bug to me. Totally unrelated to your > graphics problems. Agreed. (In reply to Stefan Dirsch from comment #29) > Hope you installed the KOTD in addition to the existing one and can still > boot the old one? Yes, I had no problem to boot on the old kernel. All is working fine. But that kind of messed up the troubleshoot path héhé. After a lot of tests, I installed from scratch beta 234.2. I noticed an option at GDM login screen which proposed to load Gnome over Wayland OR load Gnome over Xorg. I don't know if this option was present in the previous betas, but it does the trick. Gnome loads without the Intel GUC driver (/sys/module/i915/parameters/enable_guc is set to 0, with default bootloader parameters). I still have some instabilities in one use case on Xorg I will report in detail when I have more time in a few days (external screen detected on Xorg, but after reboot, GDM do not show up anymore). But regarding the initial report, I would say the issue is solved or workarounded ;) Hmm. So I guess this is again *without* "WaylandEnable=false" in /etc/gdm/custom.conf, right? Can you confirm this? So running gdm itself on Wayland and then running the GNOME session on top of X.Org appears to work for you - for whatever reasons. I guess we can then close the issue then, since this is what people will try, if GNOME on Wayland does not work. (In reply to Stefan Dirsch from comment #32) > Hmm. So I guess this is again *without* "WaylandEnable=false" in > /etc/gdm/custom.conf, right? Can you confirm this? I confirm. Ok. Let's close this one then.
> I still have some instabilities in one use case on Xorg I will report in detail > when I have more time in a few days (external screen detected on Xorg, but after > reboot, GDM do not show up anymore). But regarding the initial report, I would
> say the issue is solved or workarounded ;)
Please use a separate bugreport this then, but you could refer to this bugreport there. Thanks!
Considered fixed. |
Hi, In /etc/gdm/custom.conf, if WaylandEnable is set to false, after rebooting and typing login&password into the Gnome login screen, the system hang. I suspect a kernel panic but I don't know what log I can provide to prove it since it seems impossible to switch to a console TTY : ctrl+alt+F1 to F12 doesn't respond once the system is frozen. The problem was also present in the previous build (197.1 I believe). Reproducibility: always For your info, I need to fallback to Xorg because Wayland doesn't detect my external screen (I haven't had time to look for troubleshoot tips yet but I may open a bug report in a few days). =========================== Some info about my system is following. In a nutshell: a 2017 Optimus Laptop without any driver installed apart from those provided at the install process. I also use an encrypted /home. uname -a: Linux linux-5udt 4.12.14-lp150.8-default #1 SMP Sat Apr 7 05:12:52 UTC 2018 (8719fc4) x86_64 x86_64 x86_64 GNU/Linux inxi -F: System: Host: linux-5udt Kernel: 4.12.14-lp150.8-default x86_64 bits: 64 Desktop: Gnome 3.26.2 Distro: openSUSE Leap 15.0 Beta Machine: Device: laptop System: GIGABYTE product: P64V7 serial: HH9006711A0002 Mobo: GIGABYTE model: P64V7 serial: N/A UEFI: American Megatrends v: FB09 date: 07/28/2017 Battery BAT1: charge: 78.4 Wh 83.2% condition: 94.2/94.2 Wh (100%) CPU: Quad core Intel Core i7-7700HQ (-HT-MCP-) cache: 6144 KB clock speeds: max: 3800 MHz 1: 2800 MHz 2: 2800 MHz 3: 2800 MHz 4: 2800 MHz 5: 2800 MHz 6: 2800 MHz 7: 2800 MHz 8: 2800 MHz Graphics: Card-1: Intel Device 591b Card-2: NVIDIA GP106M [GeForce GTX 1060 Mobile] Display Server: wayland (X.org 1.19.6 ) driver: i915 tty size: 80x24 Advanced Data: N/A for root Audio: Card Intel CM238 HD Audio Controller driver: snd_hda_intel Sound: ALSA v: k4.12.14-lp150.8-default Network: Card-1: Intel Wireless 8260 driver: iwlwifi IF: wlan1 state: down mac: 9a:0c:b4:38:1c:b1 Card-2: Realtek RTL8153 Gigabit Ethernet Adapter driver: r8152 IF: eth0 state: N/A speed: N/A duplex: N/A mac: N/A Drives: HDD Total Size: 525.1GB (1.6% used) ID-1: /dev/sda model: Crucial_CT525MX3 size: 525.1GB Partition: ID-1: / size: 18G used: 6.0G (35%) fs: btrfs dev: /dev/sda4 ID-2: /var size: 18G used: 6.0G (35%) fs: btrfs dev: /dev/sda4 ID-3: /opt size: 18G used: 6.0G (35%) fs: btrfs dev: /dev/sda4 ID-4: /tmp size: 18G used: 6.0G (35%) fs: btrfs dev: /dev/sda4 ID-5: /home size: 3.0G used: 118M (4%) fs: xfs dev: /dev/dm-0 ID-6: swap-1 size: 2.15GB used: 0.00GB (0%) fs: swap dev: /dev/sda6 Sensors: None detected - is lm-sensors installed and configured? Info: Processes: 321 Uptime: 0:24 Memory: 1449.3/15918.5MB Init: systemd runlevel: 5 Client: Shell (bash) inxi: 2.3.40