Bug 441511 - ipw2100: WPA-EAP connection does not work when PMKSA caching is used
Summary: ipw2100: WPA-EAP connection does not work when PMKSA caching is used
Status: VERIFIED WONTFIX
Alias: None
Product: openSUSE 11.1
Classification: openSUSE
Component: Mobile Devices (show other bugs)
Version: Beta 4
Hardware: i586 openSUSE 11.1
: P2 - High : Major with 1 vote (vote)
Target Milestone: ---
Assignee: Vladimir Botka
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 493812
  Show dependency treegraph
 
Reported: 2008-11-04 15:33 UTC by Gerald Pfeifer
Modified: 2010-09-09 08:27 UTC (History)
5 users (show)

See Also:
Found By: Product Management
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Also seen in a hotel (unencrypted connection, hopping from good AP to one with bad signal) (4.09 KB, text/plain)
2008-11-28 02:57 UTC, Gerald Pfeifer
Details
Also seen with t-mobile network at Washington Dulles Airport (10.12 KB, text/plain)
2008-11-28 02:59 UTC, Gerald Pfeifer
Details
ipw2100 - fix a small race (991 bytes, patch)
2009-04-23 11:48 UTC, Helmut Schaa
Details | Diff
ipw2100 - work around a race between firmware and driver (11.10 KB, patch)
2009-04-23 11:50 UTC, Helmut Schaa
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gerald Pfeifer 2008-11-04 15:33:31 UTC
This is with a Thinkpad T41p and an Intel ipw2100 wireless card.

I can connect to the "Novell" wireless network, but the connection then
breaks off after a few minutes, sometimes seconds.

Two concrete access points I have tested against:

eth1      IEEE 802.11b  ESSID:"Novell"  Nickname:"rana"
          Mode:Managed  Frequency:2.437 GHz  Access Point: 00:0B:86:C2:13:82
          Bit Rate=11 Mb/s   Tx-Power:16 dBm
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Encryption key:44BB-4D3E-D621-9DAC-F45D-9A7D-26C2-0878   Security mode:open
          Power Management:off
          Link Quality=93/100  Signal level=-65 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:9  Invalid misc:31   Missed beacon:2

eth1      IEEE 802.11b  ESSID:"Novell"  Nickname:"rana"
          Mode:Managed  Frequency:2.437 GHz  Access Point: 00:0B:86:C2:16:A2
          Bit Rate=11 Mb/s   Tx-Power:16 dBm
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=85/100  Signal level=-73 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:8  Invalid misc:12   Missed beacon:0

This is pretty consistent
Comment 1 Gerald Pfeifer 2008-11-04 15:37:24 UTC
A snippet from /var/log/NetworkManager that seems to be relevant.  Observe
the roaming at 16:36:40 and back at 16:36:46.


Nov  4 16:31:56 rana NetworkManager: <info>  Activation (eth1) Stage 5 of 5 (IP Configure Commit) complete.
Nov  4 16:36:37 rana NetworkManager: <info>  (eth1): supplicant connection state change: 7 -> 3
Nov  4 16:36:38 rana NetworkManager: <info>  (eth1): supplicant connection state change: 3 -> 0
Nov  4 16:36:40 rana NetworkManager: <debug> [1225813000.936278] periodic_update(): Roamed from BSSID 00:0B:86:C2:13:82 (Novell) to (none) ((none))
Nov  4 16:36:42 rana NetworkManager: <info>  (eth1): supplicant connection state change: 0 -> 3
Nov  4 16:36:43 rana NetworkManager: <info>  (eth1): supplicant connection state change: 3 -> 4
Nov  4 16:36:46 rana NetworkManager: <debug> [1225813006.940557] periodic_update(): Roamed from BSSID (none) ((none)) to 00:0B:86:C2:0B:E2 (Novell)
Nov  4 16:36:53 rana NetworkManager: <info>  (eth1): device state change: 8 -> 3
Nov  4 16:36:53 rana NetworkManager: <info>  (eth1): deactivating device (reason: 11).
Nov  4 16:36:53 rana NetworkManager: <info>  eth1: canceled DHCP transaction, dhcp client pid 2430
Nov  4 16:36:53 rana NetworkManager: <WARN>  check_one_route(): (eth1) error -34 returned from rtnl_route_del(): Sucess
Nov  4 16:36:53 rana NetworkManager: <info>  Activation (eth1) starting connection 'Auto Novell'
Nov  4 16:36:53 rana NetworkManager: <info>  (eth1): device state change: 3 -> 4
Nov  4 16:36:53 rana NetworkManager: <info>  Activation (eth1) Stage 1 of 5 (Device Prepare) scheduled...
Nov  4 16:36:53 rana NetworkManager: <info>  (eth1): supplicant connection state change: 4 -> 0
Nov  4 16:36:53 rana NetworkManager: <info>  Activation (eth1) Stage 1 of 5 (Device Prepare) started...
Nov  4 16:36:53 rana NetworkManager: <info>  Activation (eth1) Stage 2 of 5 (Device Configure) scheduled...
Nov  4 16:36:53 rana NetworkManager: <info>  Activation (eth1) Stage 1 of 5 (Device Prepare) complete.
Comment 2 Christian Deckelmann 2008-11-04 15:53:01 UTC
Have you checked https://bugzilla.novell.com/show_bug.cgi?id=415000 ?


Comment 3 Gerald Pfeifer 2008-11-04 16:04:50 UTC
Good pointer, thank you Christian!

If it is indeed this very issue, all the better (though it would be
disappointing that we have not addressed this for FACTORY/11.1/SLED 11
in a proactive manner).

Mobilistas, I'd be happy to give a test package a try.
Comment 4 Helmut Schaa 2008-11-05 12:46:22 UTC
(In reply to comment #3 from Gerald Pfeifer)
> Mobilistas, I'd be happy to give a test package a try.

Just load the ipw2100 module with option "associate=0" (modprobe ipw2100 associate=0) and try to establish the connection.
Comment 5 Gerald Pfeifer 2008-11-05 20:42:28 UTC
I'm afraid this didn't help:

Nov  5 21:40:33 rana NetworkManager: <info>  (eth1): supplicant connection state change: 7 -> 3
Nov  5 21:40:34 rana NetworkManager: <info>  (eth1): supplicant connection state change: 3 -> 0
Nov  5 21:40:39 rana NetworkManager: <info>  (eth1): supplicant connection state change: 0 -> 2
Nov  5 21:40:39 rana NetworkManager: <info>  (eth1): supplicant connection state change: 2 -> 3
Nov  5 21:40:40 rana NetworkManager: <info>  (eth1): supplicant connection state change: 3 -> 4
Nov  5 21:40:44 rana NetworkManager: <info>  (eth1): supplicant connection state change: 4 -> 3
Nov  5 21:40:44 rana NetworkManager: <info>  (eth1): supplicant connection state change: 3 -> 0
Nov  5 21:40:45 rana NetworkManager: <debug> [1225917645.192451] periodic_update(): Roamed from BSSID 00:0B:86:C2:13:82 (Novell) to (none) ((none))
Nov  5 21:40:49 rana NetworkManager: <info>  (eth1): device state change: 8 -> 3
Nov  5 21:40:49 rana NetworkManager: <info>  (eth1): deactivating device (reason: 11).
Nov  5 21:40:49 rana NetworkManager: <info>  eth1: canceled DHCP transaction, dhcp client pid 7296
Nov  5 21:40:49 rana NetworkManager: <info>  Activation (eth1) starting connection 'Auto Novell'
Comment 6 Helmut Schaa 2008-11-06 06:36:39 UTC
(In reply to comment #5 from Gerald Pfeifer)
> I'm afraid this didn't help:

:( Ok, I'll try to reproduce the issue here.
Comment 7 Frank Seidel 2008-11-17 10:28:25 UTC
cannot reproduce it with Beta5. Gerald, could you
try if Beta5 also fixes this for you?
Comment 8 Helmut Schaa 2008-11-24 11:07:27 UTC
As neither me nor Frank were able to reproduce this issue lowering severity.
Comment 9 Gerald Pfeifer 2008-11-28 02:57:59 UTC
Created attachment 256360 [details]
Also seen in a hotel (unencrypted connection, hopping from good AP to one with bad signal)
Comment 10 Gerald Pfeifer 2008-11-28 02:59:01 UTC
Created attachment 256361 [details]
Also seen with t-mobile network at Washington Dulles Airport

Still seen with Beta 5.1/Beta 5.2, cf. the two most recent attachments.
And seen back in the Nürnberg office with RC1.
Comment 11 Gerald Pfeifer 2008-12-21 03:57:22 UTC
Bumping severity (back) to major, since this stroke yet again,
apparently even on a network with a single access point.
Comment 12 Gerald Pfeifer 2009-01-16 15:36:06 UTC
For anyone on our Waltham network this renders openSUSE 11.1/SLED 11
completely useless, at least with those older Intel nics (which I still
see in use).

That might qualify for P1, but certainly P2.
Comment 13 Helmut Schaa 2009-01-19 12:40:10 UTC
(In reply to comment #12)
> That might qualify for P1, but certainly P2.

Ok, I'll try to allocate at least one day for this issue this week.
Comment 14 Helmut Schaa 2009-01-20 16:42:54 UTC
Here are a few first results with the Novell WPA-EAP network:

If the client roams away from an AP and comes back after a while wpa_supplicant tries to use PMKSA caching to speed up authentication. However PMKSA caching is disabled in our setup here. In that case the AP has to send EAP request identity in order to do a full 802.1x authentication. Although this frame is sent by the AP it is never received by wpa_supplicant. Hence either the driver or the firmware drops that frame.

I'm not sure if this is the only issue which needs to be fixed but it would be a step in the right direction. I'll investigate further tomorrow.
Comment 15 Helmut Schaa 2009-01-21 13:40:45 UTC
I was able to track the problem described in comment #14 down to a race in the ipw2100-driver (luckily not the firmware). The circumstances are as follows:

1) The firmware receives assoc response and moved internally to associated state. followed by an interrupt which informs the driver about that change.
2) The driver does _not_ update its internal state to ASSOCIATED in the interrupt handler. Instead it queues a work to be executed later.
3) The firmware receives the identity request from the AP and accepts the frame as it is in associated state. Followed by an interrupt to the driver.
4) Driver gets rx interrupt but drops the frame due to not being in associated state.
5) The queued work is executed and state changes to associated but the AP will not resend the EAP frame as the firmware already acked it.

Now thinking about how to fix this issue.
Comment 16 Helmut Schaa 2009-01-22 14:58:46 UTC
1. In case anyone is interested in the upstream discussion, have a look at [1].
2. I found another (more generic) bug in wpa_supplicant which might also affect stability while roaming (bnc#467980).

[1] http://marc.info/?t=123255588100001&r=1&w=4
Comment 17 Helmut Schaa 2009-01-23 12:16:19 UTC
No progress here since yesterday, see upstream discussion about details
=> Not fixable for RC3.
Comment 18 Helmut Schaa 2009-01-27 14:16:19 UTC
Sigh, just found another bug regarding wpa_supplicant and ipw2100 (bnc#469779).
Comment 19 Helmut Schaa 2009-01-29 08:27:59 UTC
One thing we could try instead of fixing the driver is to disable PMK-caching in wpa_supplicant (of course I don't like to disable that for all cards but only for testing on your machine).

Just add the following line to your /etc/wpa_supplicant/wpa_supplicant.conf:

dot11RSNAConfigPMKLifetime=1

That might fix the problems you have with reconnecting.
Comment 20 Helmut Schaa 2009-01-29 08:28:54 UTC
And of course after editing the configuration please call "killall wpa_supplicant".
Comment 21 Gerald Pfeifer 2009-01-30 13:27:11 UTC
I will be able to test this Tuesday or Wednesday at the earliest since
I do not have access to an environment where this triggers before then,
and will be happy to do so, but if there is someone else who could test,
that would speed up things...
Comment 22 Helmut Schaa 2009-02-04 13:22:16 UTC
Gerald, no need to test dot11RSNAConfigPMKLifetime=1 anymore. It does not improve PMKSA caching with ipw2100. So, this is not an option for SLED11.
Comment 23 Helmut Schaa 2009-02-04 14:20:59 UTC
Just discussed possible workarounds for this bug with Christian and Stefan.

This issue could be fixed by disabling PMK-caching as a whole (either from NM or the supplicant). However, this might slow down connection attempts on _all_ drivers (of course only if the environment allows PMK-caching).

Hence, we decided to not work around this issue but focus on fixing ipw2100 instead which will seriously take some more time.
Comment 24 Helmut Schaa 2009-02-06 10:18:00 UTC
Adjusting summary.
Comment 25 Helmut Schaa 2009-02-19 17:38:07 UTC
Just a quick status update:

I have a quick'n'dirty hack for ipw2100, af_packet and wpa_supplicant ready now. WPA-EAP connections are much more stable with that, PMKSA caching works fine and as a bonus the initial authentication is speeded up by some seconds.

However, the patch is not in an acceptable state for both upstream and SLED11 yet, it's just a quick hack and needs some more work.
Comment 28 Helmut Schaa 2009-04-23 11:47:00 UTC
Status update:

I have a cleaner solution now which only touches the ipw2100 driver code (no patches for af_packet and wpa_supplicant are needed). However, the patch is still not complete and needs to be cleaned up, for both SLED11 and upstream. Furthermore, I (or somebody else) would have to run more tests with that patch to avoid possible regressions.

Detailed informations can be found in the patches itself and in the upstream discussion [1].

[1] http://marc.info/?t=123255588100001&r=1&w=4
Comment 29 Helmut Schaa 2009-04-23 11:48:57 UTC
Created attachment 287728 [details]
ipw2100 - fix a small race

First small patch which shortens a race between netif_carrier_on and netif_wake_queue.
Comment 30 Helmut Schaa 2009-04-23 11:50:32 UTC
Created attachment 287729 [details]
ipw2100 - work around a race between firmware and driver

This patch works around a race between the ipw2100 firmware and the driver. This patch and the one before are made against wireless-testing, not against the SLE11 kernel. However there shouldn't be that much changes.
Comment 33 Helmut Schaa 2009-06-04 12:45:02 UTC
Reassigning to Vlado.
Comment 38 Stefan Behlert 2010-04-06 19:50:18 UTC
Removing ODI-keyword as this should no longer affect ODI roll-out (comment 31-33).
Comment 39 Vladimir Botka 2010-09-09 08:27:24 UTC
The adapter [1] is working in 11.3 with WPA-EAP [2] without problems [3]. I close as WONTFIX in 11.1 . Reopen for 11.3 in case of any problems with WPA-EAP.

[1]
  Model: "Intel MIM2000/Centrino"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x1043 "PRO/Wireless LAN 2100 3B Mini PCI Adapter"
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x2527 "MIM2000/Centrino"

[2]
# iwconfig eth1
eth1      IEEE 802.11b  ESSID:"Novell"  Nickname:"ipw2100"
          Mode:Managed  Frequency:2.462 GHz  Access Point: 00:1A:1E:A6:F9:81   
          Bit Rate=11 Mb/s   Tx-Power:16 dBm

[3]
# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:04:23:94:F1:C4  
          inet addr:149.44.136.67  Bcast:149.44.137.255  Mask:255.255.254.0
# netstat -r
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
149.44.136.0    *               255.255.254.0   U         0 0          0 eth1
loopback        *               255.0.0.0       U         0 0          0 lo
default         vrrp.scz-core-s 0.0.0.0         UG        0 0          0 eth1

> while true; do wget -nv ftp://ftp.netng.org:/pub/Misc/test-1; sleep 300; done
2010-09-09 10:04:14 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.5" [1]
2010-09-09 10:09:23 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.6" [1]
2010-09-09 10:14:33 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.7" [1]
2010-09-09 10:19:45 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.8" [1]
2010-09-09 10:24:54 URL: ftp://ftp.netng.org/pub/Misc/test-1 [1024000] -> "test-1.9" [1]