Bugzilla – Bug 441511
ipw2100: WPA-EAP connection does not work when PMKSA caching is used
Last modified: 2010-09-09 08:27:24 UTC
This is with a Thinkpad T41p and an Intel ipw2100 wireless card. I can connect to the "Novell" wireless network, but the connection then breaks off after a few minutes, sometimes seconds. Two concrete access points I have tested against: eth1 IEEE 802.11b ESSID:"Novell" Nickname:"rana" Mode:Managed Frequency:2.437 GHz Access Point: 00:0B:86:C2:13:82 Bit Rate=11 Mb/s Tx-Power:16 dBm Retry short limit:7 RTS thr:off Fragment thr:off Encryption key:44BB-4D3E-D621-9DAC-F45D-9A7D-26C2-0878 Security mode:open Power Management:off Link Quality=93/100 Signal level=-65 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:9 Invalid misc:31 Missed beacon:2 eth1 IEEE 802.11b ESSID:"Novell" Nickname:"rana" Mode:Managed Frequency:2.437 GHz Access Point: 00:0B:86:C2:16:A2 Bit Rate=11 Mb/s Tx-Power:16 dBm Retry short limit:7 RTS thr:off Fragment thr:off Power Management:off Link Quality=85/100 Signal level=-73 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:8 Invalid misc:12 Missed beacon:0 This is pretty consistent
A snippet from /var/log/NetworkManager that seems to be relevant. Observe the roaming at 16:36:40 and back at 16:36:46. Nov 4 16:31:56 rana NetworkManager: <info> Activation (eth1) Stage 5 of 5 (IP Configure Commit) complete. Nov 4 16:36:37 rana NetworkManager: <info> (eth1): supplicant connection state change: 7 -> 3 Nov 4 16:36:38 rana NetworkManager: <info> (eth1): supplicant connection state change: 3 -> 0 Nov 4 16:36:40 rana NetworkManager: <debug> [1225813000.936278] periodic_update(): Roamed from BSSID 00:0B:86:C2:13:82 (Novell) to (none) ((none)) Nov 4 16:36:42 rana NetworkManager: <info> (eth1): supplicant connection state change: 0 -> 3 Nov 4 16:36:43 rana NetworkManager: <info> (eth1): supplicant connection state change: 3 -> 4 Nov 4 16:36:46 rana NetworkManager: <debug> [1225813006.940557] periodic_update(): Roamed from BSSID (none) ((none)) to 00:0B:86:C2:0B:E2 (Novell) Nov 4 16:36:53 rana NetworkManager: <info> (eth1): device state change: 8 -> 3 Nov 4 16:36:53 rana NetworkManager: <info> (eth1): deactivating device (reason: 11). Nov 4 16:36:53 rana NetworkManager: <info> eth1: canceled DHCP transaction, dhcp client pid 2430 Nov 4 16:36:53 rana NetworkManager: <WARN> check_one_route(): (eth1) error -34 returned from rtnl_route_del(): Sucess Nov 4 16:36:53 rana NetworkManager: <info> Activation (eth1) starting connection 'Auto Novell' Nov 4 16:36:53 rana NetworkManager: <info> (eth1): device state change: 3 -> 4 Nov 4 16:36:53 rana NetworkManager: <info> Activation (eth1) Stage 1 of 5 (Device Prepare) scheduled... Nov 4 16:36:53 rana NetworkManager: <info> (eth1): supplicant connection state change: 4 -> 0 Nov 4 16:36:53 rana NetworkManager: <info> Activation (eth1) Stage 1 of 5 (Device Prepare) started... Nov 4 16:36:53 rana NetworkManager: <info> Activation (eth1) Stage 2 of 5 (Device Configure) scheduled... Nov 4 16:36:53 rana NetworkManager: <info> Activation (eth1) Stage 1 of 5 (Device Prepare) complete.
Have you checked https://bugzilla.novell.com/show_bug.cgi?id=415000 ?
Good pointer, thank you Christian! If it is indeed this very issue, all the better (though it would be disappointing that we have not addressed this for FACTORY/11.1/SLED 11 in a proactive manner). Mobilistas, I'd be happy to give a test package a try.
(In reply to comment #3 from Gerald Pfeifer) > Mobilistas, I'd be happy to give a test package a try. Just load the ipw2100 module with option "associate=0" (modprobe ipw2100 associate=0) and try to establish the connection.
I'm afraid this didn't help: Nov 5 21:40:33 rana NetworkManager: <info> (eth1): supplicant connection state change: 7 -> 3 Nov 5 21:40:34 rana NetworkManager: <info> (eth1): supplicant connection state change: 3 -> 0 Nov 5 21:40:39 rana NetworkManager: <info> (eth1): supplicant connection state change: 0 -> 2 Nov 5 21:40:39 rana NetworkManager: <info> (eth1): supplicant connection state change: 2 -> 3 Nov 5 21:40:40 rana NetworkManager: <info> (eth1): supplicant connection state change: 3 -> 4 Nov 5 21:40:44 rana NetworkManager: <info> (eth1): supplicant connection state change: 4 -> 3 Nov 5 21:40:44 rana NetworkManager: <info> (eth1): supplicant connection state change: 3 -> 0 Nov 5 21:40:45 rana NetworkManager: <debug> [1225917645.192451] periodic_update(): Roamed from BSSID 00:0B:86:C2:13:82 (Novell) to (none) ((none)) Nov 5 21:40:49 rana NetworkManager: <info> (eth1): device state change: 8 -> 3 Nov 5 21:40:49 rana NetworkManager: <info> (eth1): deactivating device (reason: 11). Nov 5 21:40:49 rana NetworkManager: <info> eth1: canceled DHCP transaction, dhcp client pid 7296 Nov 5 21:40:49 rana NetworkManager: <info> Activation (eth1) starting connection 'Auto Novell'
(In reply to comment #5 from Gerald Pfeifer) > I'm afraid this didn't help: :( Ok, I'll try to reproduce the issue here.
cannot reproduce it with Beta5. Gerald, could you try if Beta5 also fixes this for you?
As neither me nor Frank were able to reproduce this issue lowering severity.
Created attachment 256360 [details] Also seen in a hotel (unencrypted connection, hopping from good AP to one with bad signal)
Created attachment 256361 [details] Also seen with t-mobile network at Washington Dulles Airport Still seen with Beta 5.1/Beta 5.2, cf. the two most recent attachments. And seen back in the Nürnberg office with RC1.
Bumping severity (back) to major, since this stroke yet again, apparently even on a network with a single access point.
For anyone on our Waltham network this renders openSUSE 11.1/SLED 11 completely useless, at least with those older Intel nics (which I still see in use). That might qualify for P1, but certainly P2.
(In reply to comment #12) > That might qualify for P1, but certainly P2. Ok, I'll try to allocate at least one day for this issue this week.
Here are a few first results with the Novell WPA-EAP network: If the client roams away from an AP and comes back after a while wpa_supplicant tries to use PMKSA caching to speed up authentication. However PMKSA caching is disabled in our setup here. In that case the AP has to send EAP request identity in order to do a full 802.1x authentication. Although this frame is sent by the AP it is never received by wpa_supplicant. Hence either the driver or the firmware drops that frame. I'm not sure if this is the only issue which needs to be fixed but it would be a step in the right direction. I'll investigate further tomorrow.
I was able to track the problem described in comment #14 down to a race in the ipw2100-driver (luckily not the firmware). The circumstances are as follows: 1) The firmware receives assoc response and moved internally to associated state. followed by an interrupt which informs the driver about that change. 2) The driver does _not_ update its internal state to ASSOCIATED in the interrupt handler. Instead it queues a work to be executed later. 3) The firmware receives the identity request from the AP and accepts the frame as it is in associated state. Followed by an interrupt to the driver. 4) Driver gets rx interrupt but drops the frame due to not being in associated state. 5) The queued work is executed and state changes to associated but the AP will not resend the EAP frame as the firmware already acked it. Now thinking about how to fix this issue.
1. In case anyone is interested in the upstream discussion, have a look at [1]. 2. I found another (more generic) bug in wpa_supplicant which might also affect stability while roaming (bnc#467980). [1] http://marc.info/?t=123255588100001&r=1&w=4
No progress here since yesterday, see upstream discussion about details => Not fixable for RC3.
Sigh, just found another bug regarding wpa_supplicant and ipw2100 (bnc#469779).
One thing we could try instead of fixing the driver is to disable PMK-caching in wpa_supplicant (of course I don't like to disable that for all cards but only for testing on your machine). Just add the following line to your /etc/wpa_supplicant/wpa_supplicant.conf: dot11RSNAConfigPMKLifetime=1 That might fix the problems you have with reconnecting.
And of course after editing the configuration please call "killall wpa_supplicant".
I will be able to test this Tuesday or Wednesday at the earliest since I do not have access to an environment where this triggers before then, and will be happy to do so, but if there is someone else who could test, that would speed up things...
Gerald, no need to test dot11RSNAConfigPMKLifetime=1 anymore. It does not improve PMKSA caching with ipw2100. So, this is not an option for SLED11.
Just discussed possible workarounds for this bug with Christian and Stefan. This issue could be fixed by disabling PMK-caching as a whole (either from NM or the supplicant). However, this might slow down connection attempts on _all_ drivers (of course only if the environment allows PMK-caching). Hence, we decided to not work around this issue but focus on fixing ipw2100 instead which will seriously take some more time.
Adjusting summary.
Just a quick status update: I have a quick'n'dirty hack for ipw2100, af_packet and wpa_supplicant ready now. WPA-EAP connections are much more stable with that, PMKSA caching works fine and as a bonus the initial authentication is speeded up by some seconds. However, the patch is not in an acceptable state for both upstream and SLED11 yet, it's just a quick hack and needs some more work.
Status update: I have a cleaner solution now which only touches the ipw2100 driver code (no patches for af_packet and wpa_supplicant are needed). However, the patch is still not complete and needs to be cleaned up, for both SLED11 and upstream. Furthermore, I (or somebody else) would have to run more tests with that patch to avoid possible regressions. Detailed informations can be found in the patches itself and in the upstream discussion [1]. [1] http://marc.info/?t=123255588100001&r=1&w=4
Created attachment 287728 [details] ipw2100 - fix a small race First small patch which shortens a race between netif_carrier_on and netif_wake_queue.
Created attachment 287729 [details] ipw2100 - work around a race between firmware and driver This patch works around a race between the ipw2100 firmware and the driver. This patch and the one before are made against wireless-testing, not against the SLE11 kernel. However there shouldn't be that much changes.
Reassigning to Vlado.
Removing ODI-keyword as this should no longer affect ODI roll-out (comment 31-33).
The adapter [1] is working in 11.3 with WPA-EAP [2] without problems [3]. I close as WONTFIX in 11.1 . Reopen for 11.3 in case of any problems with WPA-EAP. [1] Model: "Intel MIM2000/Centrino" Vendor: pci 0x8086 "Intel Corporation" Device: pci 0x1043 "PRO/Wireless LAN 2100 3B Mini PCI Adapter" SubVendor: pci 0x8086 "Intel Corporation" SubDevice: pci 0x2527 "MIM2000/Centrino" [2] # iwconfig eth1 eth1 IEEE 802.11b ESSID:"Novell" Nickname:"ipw2100" Mode:Managed Frequency:2.462 GHz Access Point: 00:1A:1E:A6:F9:81 Bit Rate=11 Mb/s Tx-Power:16 dBm [3] # ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:04:23:94:F1:C4 inet addr:149.44.136.67 Bcast:149.44.137.255 Mask:255.255.254.0 # netstat -r Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 149.44.136.0 * 255.255.254.0 U 0 0 0 eth1 loopback * 255.0.0.0 U 0 0 0 lo default vrrp.scz-core-s 0.0.0.0 UG 0 0 0 eth1 > while true; do wget -nv ftp://ftp.netng.org:/pub/Misc/test-1; sleep 300; done 2010-09-09 10:04:14 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.5" [1] 2010-09-09 10:09:23 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.6" [1] 2010-09-09 10:14:33 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.7" [1] 2010-09-09 10:19:45 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.8" [1] 2010-09-09 10:24:54 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.9" [1]