Bugzilla – Bug 657402
dhcpcd sends RENEWAL as ethernet broadcast instead of unicast
Last modified: 2011-04-28 11:51:03 UTC
Created attachment 403341 [details] Wireshark Sceenshot User-Agent: Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.2.8) Gecko/20100723 SUSE/3.6.8-1.1 Firefox/3.6.8 I discovered that the dhcpcd has a regression since SLES10 with regards to DHCP renewals. When dhcpcd got the initial lease successfully it shall perform a DHCP RENEW after T1. In contrast to the initial DHCP REQUEST the RENEWAL is not a broadcast but a unicast message to the DHCP Server. Current (Factory) dhcpcd-3.2.3 as used since OpenSUSE 11.0 has a regression which was not present in dhcpcd-1.3.22pl4 (SLES-10). Correct DHCP renewal semantics as implemented in dhcpcd-1.3.22pl4: * After renewal is due a special DHCP REQUEST is send as a unicast message to the DHCP Server * This unicast message is sent to the IP of the DHCP server * If the DHCP server is within the same LAN this unicast message is sent to the MAC address of the DHCP server on the ethernet layer. * If the DHCP server is in another network the unicast message is sent to the responsible gateway MAC address Incorrect DHCP renewal semantics as implemented in dhcpcd-3.2.3 (Factory) * After renewal is due a special DHCP REQUEST is send as a unicast message to the DHCP Server * This unicast message is sent to the IP of the DHCP server * If the DHCP server is within the same LAN this unicast message is sent to the MAC BROADCAST address (ff:ff:ff:ff:ff:ff) on the ethernet layer. * If the DHCP server is in another network the unicast message is sent to the MAC BROADCAST address (ff:ff:ff:ff:ff:ff) on the ethernet layer instead directly to the responsible gateway MAC address. * At least CISCOs by default don't forward packages received via Ethernet broadcast to the destination server. Consequences OpenSUSE cannot perform DHCP renewals --> After the lease finally expires the network access is interrupted and a new lease has to be aquired. --> errors on the network level, outages etc. Reproducible: Always Steps to Reproduce: 1. obtain a DHCP lease e.g using rcnetwork restart 2. verify that a DHCP lease was granted (check /var/lib/dhcpcd/dhcpcd-eth0.info 3. wait for DHCP renewal (or use /sbin/dhcpcd -n to force it manually) 4. use a network sniffer like wireshark for tracing the DHCP REQUEST Actual Results: 1. A DHCP REQUEST with correct payload is generated. 2. This unicast UDP packet is sent to the IP address of the DHCP server. 3. On layer 2 (ethernet) the package is sent to ff:ff:ff:ff:ff:ff (ethernet broadcast) 4. The package is dropped by the gateway 5. The package is not received by the DHCP Server (which lives in a different broadcast domain) 6. The renewal does not happen 7. Some time later the DHCP lease expires 8. A new lease needs to be requested by dhcpcd 9. A short network outage is noticable and some applications have trouble Expected Results: 1. A DHCP REQUEST with correct payload is generated. 2. This unicast UDP packet is sent to the IP address of the DHCP server. 3. On layer 2 (ethernet) the package is sent to MAC address of the DHCP server (same network) or to the MAC address of the gateway. 4. The package is directly received by the DHCP server or 5. forwarded by the responsible gateway as a unicast package 5. The package is received by the DHCP Server (which might live in a different broadcast domain) 6. The renewal is Acknowledge 7. The lease does never finally expire as the RENEWAL is working as defined in the RFCs 8. No network outages are observed
Created attachment 403342 [details] trace readable with wireshark In this network trace you can see (e.g. 1097) DHCP Requests send to the DHCP Server (here 10.13.51.10) as unicast packages but the layer II ethernet address is the broadcast address instead of the mac address of the cisco box.
In the meantime I contacted Roy Marples [roy _a_t_ marples.name]. He hinted me at the relevant RFC. "Here's the relevant section of RFC2131 4.3.6 Client messages Table 4 details the differences between messages from clients in various states. --------------------------------------------------------------------- | |INIT-REBOOT |SELECTING |RENEWING |REBINDING | --------------------------------------------------------------------- |broad/unicast |broadcast |broadcast |unicast |broadcast | |server-ip |MUST NOT |MUST |MUST NOT |MUST NOT | |requested-ip |MUST |MUST |MUST NOT |MUST NOT | |ciaddr |zero |zero |IP address |IP address| --------------------------------------------------------------------- So yes, any broadcast for renewal is a bug. " Yours, -- martin
Further investigation and communication with the upstream author showed: dhcpcd-1.3.22pl4 <-- SLES8 and SLES10 work fine! dhcpcd-3.2.3 <-- definitely has the problem (used with OpenSUSE 11.0, 11.1, 11.2, 11.3 and Factory probably also SLES11) dhcpcd-4.x <-- not investigated dhcpcd-5.2.9 <-- current support upstream version. Works fine but has an incompatible command line. I see two possible solution: 1. Fix 3.2.3 and release packages for all supported platforms incl. SLES11 (simpler but not future proof) 2. Go for 5.2.9 and update environment. (more work but imho the future) Yours, -- martin
From discussion with Roy Marples: > I only recently got dhcpcd updated in Debian to dhcpcd-5 from > dhcpcd-3. > That involves a new package the worked alongside dhcpcd-3 because > dhcpcd-5 now works as a single process on all interfaces. It can still > work per interface but not 100% the same way with dhcpcd-3. SuSE may > wish to go the same way. Yours, -- martin
Hi Peter, how do you intend to deal with this bug? Do you need any further help? Yours, -- martin
Hi Peter, do you already have a target date or target release for fixing this issue? Yours, -- martin
Hi, I'm having the same problem. Right now there are 5+ computers (SLED 11) sending requests every 3 seconds, and they are starting to clog up the low bandwidth (10 Mbps) network. I know I can fix the situation simple rebooting the computers, or even only restarting the network service, but that's not a solution. My votes for this bug.
Pieter, please answer https://bugzilla.novell.com/show_bug.cgi?id=657402#c6 THX, -- martin
NTS will take care of the issue now, as we have a report for SLE11.
i reopen this bug to public, as we have cloned the bug as bug#672038 and now working on the new one to fix the issue.
Hi Martin! Thank you very much for this bug report and all your investigations! I'm currently reviewing at this isse, Peter will join as soon as possible. Yes, renew has to be a unicast. I've a some early test code that seems to work -- at least in a setup without relay. I've to check why it does not work with a relay between [fw rules?]. I hope, we have a test package ready today - we'll see. You write in your initial (and further) comment(s): > * If the DHCP server is within the same LAN this unicast message is > sent to the MAC address of the DHCP server on the ethernet layer. > * If the DHCP server is in another network the unicast message is > sent to the responsible gateway MAC address Perhaps I miss something, but it seems it has to be sent directly to the server -- gateway is not involved: http://tools.ietf.org/html/rfc2131#section-4.3.2 "4.3.2 DHCPREQUEST message [...] o DHCPREQUEST generated during RENEWING state: 'server identifier' MUST NOT be filled in, 'requested IP address' option MUST NOT be filled in, 'ciaddr' MUST be filled in with client's IP address. In this situation, the client is completely configured, and is trying to extend its lease. This message will be unicast, so no relay agents will be involved in its ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ transmission. Because 'giaddr' is therefore not filled in, the DHCP server will trust the value in 'ciaddr', and use it when replying to the client. A client MAY choose to renew or extend its lease prior to T1. The server may choose not to extend the lease (as a policy decision by the network administrator), but should return a DHCPACK message regardless. [...]"
(In reply to comment #16) > I've a some early test code that seems to work -- at least in a setup > without relay. I've to check why it does not work with a relay between > [fw rules?]. I hope, we have a test package ready today - we'll see. OK, the patch seems to work in a clean setup [without policy routing from previous tests that were still on the server ;-)]. I'll make a cleanup, attach the patch and provide Url to a test package. > Perhaps I miss something, but it seems it has to be sent directly to > the server -- gateway is not involved: ^^^^^^^ relay of course... Forget it: just a little confusion until I wrote it wrong. You didn't wrote it is sent to the dhcp relay as I read it, but to the gateway MAC.
(In reply to comment #16) Hi Marius, thanks for looking into this issue. > You write in your initial (and further) comment(s): > > > * If the DHCP server is within the same LAN this unicast message is > > sent to the MAC address of the DHCP server on the ethernet layer. > > * If the DHCP server is in another network the unicast message is > > sent to the responsible gateway MAC address > > Perhaps I miss something, but it seems it has to be sent directly to > the server -- gateway is not involved: Just for clarification. DHCP renewal is always an unicast. Case 1: DHCP server is on the same ethernet segment. In this case the layer II ethernet destination address of the renewal request is the MAC address of the ethernet card of the DHCP server. The unicast destination ip address in the renewal request is the ip address of the DHCP server. Case 2: DHCP server is on a different ethernet segment. In this case the layer II ethernet destination address of the renewal request is the MAC address of the gateway. The unicast destination IP address in the renewal request is the IP address of the DHCP server. This is plain and standard IP. The gateway in this description has nothing to do with a DHCP relay. A DHCP relay does more than a plain IP gateway/router as it acts more like a proxy than a plain IP router. dhcpcd-3.2.3 incorrectly sends the DHCP renewal to the HW broadcast address ff:ff:ff:ff:ff:ff (not to be confused with the IP broadcast address) instead to the _unicast_ _HW_ address of the 1. DHCP server if it resides on the same segment or 2. gateway/router if the DHCP server lives in a different segment. In this case it is the job of the IP gateway to determine the This description has nothing to do with DHCP relay agents as these are according to the relevant RFCs NOT involved in DHCP renewal activities. > http://tools.ietf.org/html/rfc2131#section-4.3.2 > "4.3.2 DHCPREQUEST message > [...] > configured, and is trying to extend its lease. This message will > be unicast, so no relay agents will be involved in its > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > transmission. Because 'giaddr' is therefore not filled in, the > DHCP server will trust the value in 'ciaddr', and use it when > replying to the client. Please note that in the Internet documentation (RFC) generally, and in this bug report specifically, a gateway is a plain IP-level router not a DHCP proxy or similar. This is different from giaddr which refers to a "Relay agent IP address, used in booting via a relay agent" as defined in RFC 2131. If required I can create a documentation on the byte level of a currently observed IP package during DHCP renewal and a correct package. Please tell me if this would be of some help. Yours, -- martin
Created attachment 414414 [details] Patch causing sending of a renew request as unicast to the server Please test extensively -- it may cause that more messages are send as unicast now... Test packages including this fix will appear at http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/network:/dhcp/
(In reply to comment #18) > (In reply to comment #16) > > Just for clarification. > > DHCP renewal is always an unicast. Sure. I just didn't read carefully enough what you (correctly) initially wrote and have had a _dhcp_ _relay_ in my mind all the time, that forwards (as dhcp relay) the broadcast, but is not involved while _routing_ of the not the unicasts.
_routing_ the unicasts.
About the log messages: In case of unicasts the "sending $CMD with xid $XID" message has now an additional "to $IPADDR" appended: *** normal request after discover: dhcpcd[24478]: eth1: broadcasting for a lease dhcpcd[24478]: eth1: sending DHCP_DISCOVER with xid 0x613c251 dhcpcd[24478]: eth1: waiting for 20 seconds ifup-dhcp: . dhcpcd[24478]: eth1: got a packet with xid 0x613c251 dhcpcd[24478]: eth1: offered 172.16.4.104 from 172.16.3.231 dhcpcd[24478]: eth1: sending DHCP_REQUEST with xid 0x613c251 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dhcpcd[24478]: eth1: waiting for 19 seconds dhcpcd[24478]: eth1: got a packet with xid 0x613c251 dhcpcd[24478]: eth1: checking 172.16.4.104 is available on attached networks *** unicast request in renew: dhcpcd[24984]: eth1: renewing lease of 172.16.4.104 dhcpcd[24984]: eth1: sending DHCP_REQUEST with xid 0x560a6acd to 172.16.3.231 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dhcpcd[24984]: eth1: waiting for 45 seconds dhcpcd[24984]: eth1: got a packet with xid 0x560a6acd dhcpcd[24984]: eth1: leased 172.16.4.104 for 120 seconds So when you see something like "sending DHCP_DISCOVER ... to $IPADDR" in the logs or when in another state, the patch may break something. Please capture the packets in this case with wireshark and attach.
Please set DHCPCD_USER_OPTIONS="-d" in /etc/sysconfig/network/dhcp to get the "sending" debug messages.
Created attachment 414617 [details] Patch causing sending of a renew request as unicast to the server Fixed a memory leak in the patch Please update to the most recent package (> 3.2.3-95.1 with changelog date from today (Do Feb 17 2011). Note, it is not built yet -- it will become available after a rebuilt at: http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/network:/dhcp/
Submit to network:dhcp/dhcpcd requested in #61556.
The package with the fixed patch has been rebuilt: dhcpcd-3.2.3-96.1
(In reply to comment #22) Hi Marius, thank you very much for addressing this issue! I am testing with the recent dhcpcd-3.2.3-96.1.i586 rpm on OpenSUSE 11.3 (32Bit) In /var/log/messages I get: rt-z9856:/var/log # grep -i "dhcpcd" messages | grep DHCP Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending DHCP_DISCOVER with xid 0x1fb25b79 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending DHCP_REQUEST with xid 0x1fb25b79 Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x22028252 to 10.13.51.10 Feb 22 14:59:49 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x57c200c2 to 10.13.51.10 Feb 22 15:04:50 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x286ba7d3 to 10.13.51.10 Feb 22 15:09:50 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x14e67029 to 10.13.51.10 Feb 22 15:14:50 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x38e9df8c to 10.13.51.10 Feb 22 15:19:51 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x4ffdee10 to 10.13.51.10 Feb 22 15:24:51 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x2311ad4a to 10.13.51.10 Feb 22 15:29:51 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x64703e9c to 10.13.51.10 Feb 22 15:34:52 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x69a0c7be to 10.13.51.10 Feb 22 15:39:52 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x553c3d0d to 10.13.51.10 Feb 22 15:44:52 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x27d67456 to 10.13.51.10 In more detail: Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: dhcpcd 3.2.3 starting Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: hardware address = d4:85:64:01:59:d0 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: removing IP address 10.13.137.41/22 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: broadcasting for a lease Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending DHCP_DISCOVER with xid 0x1fb25b79 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: waiting for 999999 seconds Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: got a packet with xid 0x1fb25b79 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: offered 10.13.137.41 from 10.13.51.10 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending DHCP_REQUEST with xid 0x1fb25b79 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: waiting for 999999 seconds Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: got a packet with xid 0x1fb25b79 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: got subsequent offer of 10.13.137.41, ignoring Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: waiting for 999999 seconds Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: got a packet with xid 0x1fb25b79 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: checking 10.13.137.41 is available on attached networks Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending ARP probe #1 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending ARP probe #2 Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending ARP probe #3 Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: sending ARP claim #1 Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: sending ARP claim #2 Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: leased 10.13.137.41 for 600 seconds Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: renew in 300 seconds Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: rebind in 525 seconds Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: adding IP address 10.13.137.41/22 Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: adding default route via 10.13.136.1 metric 0 Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: no dns information to write Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: writing /var/lib/dhcpcd/dhcpcd-eth0.info Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: exec "/etc/sysconfig/network/scripts/dhcpcd-hook" "/var/lib/dhcpcd/dhcpcd-eth0.info" "new" Feb 22 14:49:48 rt-z9856 dhcpcd-hook: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched... Feb 22 14:49:48 rt-z9856 dhcpcd-hook: You can find my version in /etc/resolv.conf.netconfig ... Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: setting hostname to `rt-z9856' Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: exec "/etc/sysconfig/network/scripts/dhcpcd-hook" "/var/lib/dhcpcd/dhcpcd-eth0.info" "complete" Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: forking to background Feb 22 14:49:48 rt-z9856 dhcpcd[4072]: eth0: exiting Feb 22 14:49:48 rt-z9856 dhcpcd[4449]: eth0: waiting for 300 seconds Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: renewing lease of 10.13.137.41 Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: sending DHCP_REQUEST with xid 0x22028252 to 10.13.51.10 Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: waiting for 225 seconds Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: got a packet with xid 0x22028252 Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: leased 10.13.137.41 for 600 seconds Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: renew in 300 seconds Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: rebind in 525 seconds Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: adding IP address 10.13.137.41/22 Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: adding default route via 10.13.136.1 metric 0 Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: no dns information to write Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: writing /var/lib/dhcpcd/dhcpcd-eth0.info Feb 22 14:54:49 rt-z9856 dhcpcd[4449]: eth0: exec "/etc/sysconfig/network/scripts/dhcpcd-hook" "/var/lib/dhcpcd/dhcpcd-eth0.info" "up" Feb 22 14:54:49 rt-z9856 dhcpcd-hook: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched... Feb 22 14:54:49 rt-z9856 dhcpcd-hook: You can find my version in /etc/resolv.conf.netconfig ... > So when you see something like "sending DHCP_DISCOVER ... to $IPADDR" I could not detect such a message in the logs. rt-z9856:/var/log # grep DHCP_DISCOVER messages Feb 22 14:49:47 rt-z9856 dhcpcd[4072]: eth0: sending DHCP_DISCOVER with xid 0x1fb25b79 Things look much improved on my tests sofar. The only thing I am wondering is what happends in the common case that the renewal does not change any DHCP parameters (same ip, route etc.). Does this lead to a reconfiguration of the ethernet device with potential network glitches? Yours, -- martin
BTW: You key did expire Schlüssel-ID: 4EB0BBCC53E8057E Schlüsselname: home:mtomaschewski OBS Project <home:mtomaschewski@build.opensuse.org> Schlüsselfingerabdruck: 752FB5CC794A4338398CD52B4EB0BBCC53E8057E Key Created: Di 22 Jan 2008 22:01:36 CET Key Expires: Do 01 Apr 2010 23:01:36 CEST (ABGELAUFEN)
(In reply to comment #27) Hi! > I could not detect such a message in the logs. OK. > Things look much improved on my tests sofar. The only thing I am wondering is > what happends in the common case that the renewal does not change any DHCP > parameters (same ip, route etc.). > > Does this lead to a reconfiguration of the ethernet device with potential > network glitches? AFAIS it should not happen. It first adds the new address [may result in a "already exists" error, that is correctly ignored] and removes the old address only when they differ. Similar but more complex for routes. So the address and routes are never removed, when they didn't changed (but will be readded e.g. in case you've modified something manually).
I am using it on multiple machines since the release successfully. Thanks, -- martin
Update released for: dhcpcd, dhcpcd-debuginfo, dhcpcd-debugsource Products: SLE-DEBUGINFO 11-SP1 (i386, ia64, ppc64, s390x, x86_64) SLE-DESKTOP 11-SP1 (i386, x86_64) SLE-SERVER 11-SP1 (i386, ia64, ppc64, s390x, x86_64) SLES4VMWARE 11-SP1 (i386, x86_64)
Update released for: dhcpcd, dhcpcd-debuginfo, dhcpcd-debugsource Products: openSUSE 11.2 (debug, i586, x86_64)
This is an autogenerated message for OBS integration: This bug (657402) was mentioned in https://build.opensuse.org/request/show/66233 https://build.opensuse.org/request/show/66414 https://build.opensuse.org/request/show/66606