Bugzilla – Bug 1097605
iwlwifi module does crash during normal operation. wlan afterwards is not reacting any more.
Last modified: 2019-04-04 06:54:32 UTC
Created attachment 773992 [details] Output of event wlan stops working dmesg X201 Lenovo, ironlake Wlan: Centrino Ultimate-N 6300 3x3 AGN During normal operation wlan will become unusable. Connection is lost. Adapter is said to "not exist, not found". I join output of dmesg.
Created attachment 773993 [details] more detailed output dmesg from the same dmesg, maybe necessary
Only judging from the stack trace, this doens't look like a new problem. Could you give hwinfo output, as well as the full kernel messages, not only the iwlwifi Oops?
Created attachment 774048 [details] dmesg (with sudo dmesg > file) hope that is the desired info how to I get hold on hwinfo output? Same in terminal as dmesg? BTW, I am left currently without wlan or eth and had to manualy delete resolve.conf to get it back with eth at least.
Created attachment 774049 [details] hwinfo got it, here it is, hwinfo
Thanks. At best, could you try to get the full kernel messages of the session showing the iwlwifi error (either from journalctl or messages archive). Also, you may try to toggle the power management option via iwconfig, too.
forgive me the ignorance but could you tell me the position of the archives?
Either the output of "journalctl -k" or /var/log/messages. With journalctl, you can pass like "-b -1" or "-b -2" to see the previous or the one earlier session.
Created attachment 774061 [details] output journactl that "should" be what you asked for. Please tell me if not.
Created attachment 774063 [details] output journactl with -k as this is a new install, the -k -b -1 gave: Specifying boot ID or boot offset has no effect, no persistent journal was found. So hopefully that -k is what you need and want.
Then the journal isn't saved on your system storage, hence only the current session is available, and no past log is available via journal. Try to check /var/log/messages instead. You can cut the appropriate part from the file.
Created attachment 774293 [details] journalctl_-k_-r_right_after_accident I was going to send you the var/log/msg part but in that very moment I had another wlan crash. If you restart then you can connect to wlan (you need to restart) but then, wlan does not resolve dnd?? Or at least it seams so. Resolve.conf is not appearing different or altered. I join journalctl -k -r as I have made the journal persistent now. I will also post the var/log/msg of the day about which you asked (but as I do not know where to cut, i allow me the whole journal. Sorry for answering right now, but as akonadi decided to hide your mail until an akonadictl fsck, I saw your reply only today.
Created attachment 774296 [details] var_Log from day 13 as described the whole log of day 13. Note; I had another wlan crash, so I attached the eth0. The DNS works flawlessly. So wlan is shown to be connected, does not allow to resolve addresses, but when connecting eth0 all works immediately (no restart of system or services).
Can it be a problem of the network stack? Does the thing recover when you run like /sbin/rcnetwork restart ?
What commands do I have to run exactly? I tried with systemctl restart networkmanager.service and did not work (or at least does not make a difference) So the command would be sudo /sbin/rcnetwork restart? I will try, but if it is the networkstack, why ethernet works flawlessly just pluggin in? When it happens, the wlan is not only lost, but also the card seems to "disappear" and not ssids are visible. Ethernet works then. If you logout and login, sometimes it sees the wlan, tries to connect, fails, then the ssid disapper, then it reappears, then it retries...etc. But sometimes it simply does not see the wlan any more. A full reboot is then necessary. Once done, all is normal again. It often happens after the system was idle for a long time, or if it is a session that was suspended to disk.
OK, then maybe it would unlikely help, but it's still worth to check whether it does anything useful or not. Another thing you may try is to adjust the power-saving via iwconfig. But it's also another shot in the dark.
Created attachment 774565 [details] that is the log after the adviced restart of networkstack No difference, wlan is dead. You need to restart the entire system to get wlan back. Restart of the network stack, although it does not give error has no effect. What I did notice: when shutting down with sudo shutdown -r now, I get a warning about "dm-crypt not getting io control on the ssd" or similar. I tried to find these messages (that are repeated as scrolling message before the restart) in the logs, but eventually I do not know where to search. Again this event was after 2 h of idle time.
Created attachment 774802 [details] hardware became unavailable during restart (new trace may be useful) I join a new journalctl from a recent incident (because I have read through and I think there might be some information about what happens). I do not know if that is related but there is a high amount of ram taken and the disc runs a lot. But this can be independent from the wlan issue.
I observed this behavior now for quite some time. Stable points are: a) the crash seems to appear only when beforehand the system was either suspended to disc or suspended to ram (long time idle). b) the crash is considering only and exclusively wlan functionality. It manifests in two varieties of "gravity". Or the wlan is visible, you can connect to it but it does not resolve addresses. Or the wlan hardware "disappears" right after the incident. c) when attaching an Ethernet cable, although the wlan remains not present, or not functional, you have flawless connection without any problem. I am writing this with exactly this situation. d) sometimes, but no always, the wlan disappearance is concomitant with a high system load and temperature which follow the event. Please let me know what I could do to find out the reason of this (which is present only in Leap 15. 42.2/3 were O.K. with wlan).
Could you double-check the latest Leap 42.3 kernel doesn't show this behavior? The leap 42.3 kernel should still work on Leap 15.0 user-space, too. Just install it via zypper in --oldpackage. I'm asking this because Leap 42.3 kernel had already a bunch of iwlwifi backports. So it was far more than the stock 4.4.x kernel. If Leap 42.3 kernel still works and Leap 15.0 doesn't, we can narrow down the regression range, at least.
(In reply to Takashi Iwai from comment #19) > Could you double-check the latest Leap 42.3 kernel doesn't show this > behavior? > The leap 42.3 kernel should still work on Leap 15.0 user-space, too. Just > install it via zypper in --oldpackage. > > I'm asking this because Leap 42.3 kernel had already a bunch of iwlwifi > backports. So it was far more than the stock 4.4.x kernel. > If Leap 42.3 kernel still works and Leap 15.0 doesn't, we can narrow down > the regression range, at least. Just to be on the safe side (as I have never tried this). I have to register a 42.3 Repo (which is probably 42.3 update). Thereafter the syntax would be: sudo zypper in -from (name of the repo) --oldpackage nameofthepackage.rpm? Is this correct?
It's not safer :) Just download the latest kernel-default.rpm from the latest Leap 42.3, and install that file via zypper in --oldpackage kernel-default*.rpm. The latest Leap 42.3 kernel from git repo is found at http://download.opensuse.org/repositories/Kernel:/openSUSE-42.3/standard/
I have now installed an I am running: uname -a Linux linux-hk9l.suse 4.4.138-9.g89489c7-default #1 SMP Wed Jun 27 05:04:06 UTC 2018 (89489c7) x86_64 x86_64 x86_64 GNU/Linux Observations: a) compared to the original Kernel of Leap 15 the working temperature of the machine is astonishingly lower. b) wlan does not crash and is functional, also after suspend to disk and after hours of being idle. c) the memory leak that I am experiencing with akonadi/kontakt when running idle (up to 8 GB memory + 3 GB of Swap before crashing or emergency shutdown for temperature) is no present when running the old kernel. Memory will be 2.5 with three users open and two instances kontact. With FF it will take (for one user) 3,5 GB. And it is stable when being idle, no new consumption. d) with the leak, system will get sluggish or not react any more. Also this is not present with the old kernel. The 42.3 kernel is, for what ever reason, for the Lenovo X201, really noteworthy superior for what is stability and power. The wlan after suspend is able to reconnect (unlike with the original kernel). However the bug with wallet not opening is present as well, thus you need to log out with that user and log in to get the wallet prompt. After that it reconnects normally, even after suspend to disk. With the new kernel this will not happen. e) occasional system freezes that require hard reboot, do not happen with the older kernel.
Created attachment 775891 [details] dmesg - wlan crash the same ssid shown twice dmesg and journalctl -k-r after crash with the standard kernel. I join this in the hope of some more info, the new is: when the wlan crashes the the wlan ssid is shown as connected with 0kb/s but it is shown a second time (non connected). Since I have only one wlan with that ssid, this might well be related, so maybe the two files give some insight.
Created attachment 775893 [details] journalctl_-k_-r_with ssid shown twice in networkmanager as described with the first attachment.
Created attachment 775924 [details] Not able to update assoc_msk? Situation was: crash of wlan as usual. Wlan indicator blinks fast, disk aktivity led continues to work all the time. No wlan any more. impossible to shut down from the plasma desktop. Possible to shutdown from terminal with the known issue that the kernel complains about not being able to release the io_ctl from /dev/sda2. When I restarted no wlan from the very beginning: in the journal there was all red with the wlan complaining about not being able to change accoc_msk. I would also think about a hardware problem but as the card works without problems with the 4.4 kernel I would think that can be excluded. As long as you do not educate me otherwise.
Today I started up the system and it did hang before it would start up to the desktop. I was able to switch to tty8 (as there during boot the messages seems to appear) and the line were: [ 17.750603] ACPI Warning: SystemIO range 0x0000000000001028-0x000000000000102F conflicts with OpRegion 0x0000000000001000-0x000000000000107F (\_SB.PCI0.LPC.PMIO) (20170303/utaddress-247) [ 17.750626] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 17.750630] ACPI Warning: SystemIO range 0x00000000000011C0-0x00000000000011CF conflicts with OpRegion 0x0000000000001180-0x00000000000011FF (\_SB.PCI0.LPC.LPIO) (20170303/utaddress-247) [ 17.750634] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 17.750635] ACPI Warning: SystemIO range 0x00000000000011B0-0x00000000000011BF conflicts with OpRegion 0x0000000000001180-0x00000000000011FF (\_SB.PCI0.LPC.LPIO) (20170303/utaddress-247) [ 17.750639] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 17.750640] ACPI Warning: SystemIO range 0x0000000000001180-0x00000000000011AF conflicts with OpRegion 0x0000000000001180-0x00000000000011FF (\_SB.PCI0.LPC.LPIO) (20170303/utaddress-247) [ 17.750644] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 17.750645] lpc_ich: Resource conflict(s) found affecting gpio_ich while the resource conflict affecting gpio_ich was the last line, the led of the wlan was flickering wildly. This lasted for about 20 - 30 seconds. Then the system started up, and as it started up the wlan stopped to flicker. Can it be that the new kernel does something different with ACPI settings and IO? After boot up the wlan worked normally, but of course after a long time idle situation it will crash.
Any update on this? Although it happens (with minor frequency after several hours of work, it happens much more frequently and faster (from 20 minutes to max 2 hours) after the system was previously suspended to disc and then awoken.
Unfortunately there is no much I can help at this moment. Certainly this is a regression in the upstream code, but the upstream developer couldn't identify the cause. Someone needs to step in deeply and tries to git bisection which change broke the things.
This has changed characteristics as I have now seldom wlan crashes with the 4.12 kernel but now I experience shutdowns due to Core temp (in CPU1) of more then 112°C. This temperature problem is linked to the kernel and does not appear with the 4.4 kernel. And (a part of the temperature warning) I do not see any error message in the logs. With the 4.4 kernel and same charge I have a CPU Temp of 65°C with low ventilator activity and that corresponds to the normal (Lenovo has a somewhat low ventilation in Linux). The only difference AFAIK was an update of intel ucode. WLAN still crashes but just after having suspended the system, while this has become seldom due to constant emergency shutdown due to overheat.
The current updated kernel 4.12 Announcement ID: openSUSE-SU-2018:2119-1 does not solve the issue. The issue is producing as before once the machine has been suspended and again awoken. As a comparison I am using now kernel stable. E.g. here Linux linux-hk9l.suse 4.17.10-3.gf604b8a-default #1 SMP PREEMPT Thu Jul 26 05:30:20 UTC 2018 (f604b8a) x86_64 x86_64 x86_64 GNU/Linux This kernel does not present the issue. WLAN is stable and does not crash even after suspend and wake up. The overall memory load is lower and CPU temperature with the standard kernel varies by 30 C less than with the distribution product. As a work around, I am currently sticking to the standard kernel. But this bug continues to persist. Thank you.
I have the same issue on a System 76 Galago Ultra Pro running Tumbleweed v20180917. Device: Intel Corporation Centrino Advanced-N 6235 (rev 24) Kernel: 4.18.7-1-default #1 SMP PREEMPT Sun Sep 9 10:26:20 UTC 2018 (952d850) x86_64 x86_64 x86_64 GNU/Linux Params: BOOT_IMAGE=/boot/vmlinuz-4.18.7-1-default root=UUID=c0af6a57-0eb7-4be7-809c-af53b6334a19 splash=silent resume=/dev/disk/by-label/Swap quiet pcie_aspm=force Symptoms are very similar although the crash can also be spontaneous while working. And nothing else than a full reboot will bring the interface back in working state. After the crash it shows up but it's unable to connect to anything.
(In reply to Vanista Herion from comment #32) > I have the same issue on a System 76 Galago Ultra Pro running Tumbleweed > v20180917. > > Device: Intel Corporation Centrino Advanced-N 6235 (rev 24) > Kernel: 4.18.7-1-default #1 SMP PREEMPT Sun Sep 9 10:26:20 UTC 2018 > (952d850) x86_64 x86_64 x86_64 GNU/Linux > Params: BOOT_IMAGE=/boot/vmlinuz-4.18.7-1-default > root=UUID=c0af6a57-0eb7-4be7-809c-af53b6334a19 splash=silent > resume=/dev/disk/by-label/Swap quiet pcie_aspm=force > > Symptoms are very similar although the crash can also be spontaneous while > working. And nothing else than a full reboot will bring the interface back > in working state. After the crash it shows up but it's unable to connect to > anything. I can confirm that the crashes were also possible spontaneously. I had to change to kernel stable and corresponding firmware and (besides of the impossibility to connect again after suspend to disc, you need a full reboot) at least it is not crashing currently. However with the standard Leap Kernel you can simply not use wlan on this machine. You might try the next TW edition to see if the situation turns normal. This would be interesting to know, in the sense to see what changed. My kernel is currently Linux roadrunner.suse 4.18.8-2.gf486469-default #1 SMP PREEMPT Sat Sep 15 14:10:30 UTC 2018 (f486469) x86_64 x86_64 x86_64 GNU/Linux
Created attachment 791562 [details] dmesg output of wlan during recent crashes. Quite verbouse. Adding an attachment of a recent crashes. Since there are EXT4 problems in 4.19 I am using - trying to use - the default kernel and it crashes to my dismay with regularity. Joining output of dmesg. uname -a Linux roadrunner.suse 4.12.14-lp150.12.25-default #1 SMP Thu Nov 1 06:14:23 UTC 2018 (3fcf457) x86_64 x86_64 x86_64 GNU/Linux
Linux roadrunner.suse 4.12.14-lp150.12.48-default #1 SMP Tue Feb 12 14:01:48 UTC 2019 (268f014) x86_64 x86_64 x86_64 GNU/Linux As of Leap 15 this does not happen any more. However since the last kernel update now he wlan does not wake up any more. You have to do a full restart. This happens 50 % of the time. Thus rendering hibernation nearly impossible / useless. I am opening a new bug for this and close this one.