Hello,
After asking around in #centos and other linux related IRC channels, here I am, bugging all of you.
brace for the long post. (tldr: dhclient loses virtual interface ips)
I set up a system to deploy statically assigned IPs to physical and virtual interfaces to a number of (virtual) machines.
I have a DHCPD with an entry for each virtual ip in the form of
host eth0-1.virt1.test.it { option dhcp-client-identifier "00:50:56:00:84:00-eth0:1"; fixed-address 10.192.52.132; }
where 00:50:56:00:84:00 is the physical mac address, but that's irrelevant.
on the client machine I got inside sysconfig's system and wrote a custom ifup-local-eth0 that launches one dhclient per virtual interface, like:
/sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0:1.leases -pf /var/run/dhclient-eth0:1.pid eth0:1
always on the client in dhclient.conf I have entries like:
interface "eth0:1" { send dhcp-client-identifier "00:50:56:00:84:00-eth0:1" ; }
at service network start everyone is happy, virtual interfaces gets configured and everything works.
after a while (hours, days) suddenly _some_ (not alway all of them, sometimes 1 out of 10, sometimes 9 out of 10) of the virtual IPs disappear.
in the DHCPD log files I see things like (not the same machine, because on that I didn't have the issue recently):
Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.103 from 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPACK on 10.192.52.103 to 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.103 from 00:50:56:00:37:00 via eth1: lease 10.192.52.103 unavailable. Aug 11 16:39:52 cp1 dhcpd: DHCPNAK on 10.192.52.103 to 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPDISCOVER from 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPOFFER on 10.192.52.102 to 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.102 (10.192.50.21) from 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPACK on 10.192.52.102 to 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPDISCOVER from 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPOFFER on 10.192.52.103 to 00:50:56:00:37:00 via eth1 Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.102 (10.192.50.21) from 00:50:56:00:37:00 via eth1: lease 10.192.52.102 unavailable. Aug 11 16:39:52 cp1 dhcpd: DHCPNAK on 10.192.52.102 to 00:50:56:00:37:00 via eth1
Also, in the clients' leases file I find leases for ips that should go to othe dhclients on the same machine.
My guesses are either:
a) dhclient doesn't always send dhcp-client-identifier and thus the server can't verify the IP they claim is theirs. why this doesn't happens at every renew I don't know (I tried using very short lease times, and wasn't able to reproduce the issue).
b) dhclients steals DHCPOFFER replies from each other, leading the stealing DHCP take charge of the stolen from IP and having its original one fade into oblivion.
to work around b) I tried to point the N dhclients to the same lease file, but this still lead to NAKs (albeit not lost IPs) _and_ a strangely looking leases file, with empty lines but apparently no significant corruption.
software versions:
# dhcpd -V Internet Systems Consortium DHCP Server V3.0.5-RedHat
# dhclient -V Internet Systems Consortium DHCP Client V3.0.5-RedHat
the problem happened also using DHCP Server 3.0.1 on an older CentOS 4.5
I'm looking for ideas to better pinpoint the problem, alternatives, anything.
possible FAQs: Q) do you have dynamic ip pools on the DHCP server? A) yes, but not on this subnet. This subnet only has statically assigned ips, both physical (mapped with MAC addresses) and virtual (mapped with dhcp-client-id
Q) do you see IPs assigned on the wrong machine? A) never happened.
Q) do service network restart fix the issue? A) yes, always, and yes, temporarily. This makes me think the system is basically right and this may be a bug.
Q) do you have network issues? A) no, and this doesn't seem to happen in moments where a network related problem may be likely
Q) why don't you use sysconfig virtual interfaces configuration files? A) they doesn't support DHCP, only static IPs
Q) did you try building a newer dhclient/dhcpd? A) I built 4.2's dhclient from ISC sources but the pair dhclient/dhclient-script isn't a drop in replacement for CentOS.
Q) why don't you use static IPs, if they are static in DHCP? A) I want to be able to reconfigure the networks without editing tons of files on tons of machines. I currently have tens of these VMs deployed and the number is likely to increase alot in the future. The deployment and configuration is automatic using a central configuration.
Attached a txt with the tcpdump of the above log extraction.
Thanks.
On Wed, 2010-08-11 at 18:18 +0200, Simone Caldana wrote:
# dhclient -V Internet Systems Consortium DHCP Client V3.0.5-RedHat
I have always thought this to be a bug. Maybe your views reinforce it even more. How ever I have not filed a bug report because my issue only stems to cheap routers and not a proper setup.
I have had the same problems with dhcp client. Although I am not using in the way you are. You see the same problems under a regular eth0 using dhcp. My problems stem from these cheap routers that ISPs provide mostly.
Q) do service network restart fix the issue? A) yes, always, and yes, temporarily. This makes me think the system is basically right and this may be a bug.
And yes I've seen the temp fix also that gives.
Q) why don't you use static IPs, if they are static in DHCP? A) I want to be able to reconfigure the networks without editing tons of files on tons of machines. I currently have tens of these VMs deployed and the number is likely to increase alot in the future. The deployment and configuration is automatic using a central configuration.
That was my solution to use static IPs. on the Linux Machines only. You do not see this under Windows.
Attached a txt with the tcpdump of the above log extraction.
That dump is not long enough to tell anything funny going on.
John
John, thanks for the feedback.
Il giorno 11/ago/2010, alle ore 19.46, JohnS ha scritto:
That was my solution to use static IPs. on the Linux Machines only. You do not see this under Windows.
this reinforces my idea this is a dhclient bug rather than dhcpd's.
Attached a txt with the tcpdump of the above log extraction.
That dump is not long enough to tell anything funny going on.
I know. I reported it to pair it with the the above messages log. It doesn't seem to me these packets contain the dhcp-client-identifier..
I am collecting port 67 traffic on the dhcp server since this afternoon. I hope to be able to find out more.
Il giorno 11/ago/2010, alle ore 20.31, Simone Caldana ha scritto:
I am collecting port 67 traffic on the dhcp server since this afternoon. I hope to be able to find out more.
further testing revealed that dhclient really asks for the wrong ip (but always sends the client-identifier), and this is due to the poisoned leases file. I am now using /dev/null as leases file for all the virtual dhclients, but I wonder if this disables the entire "keep the old lease" system (which I don't really need) or it simply stays only in memory, which won't really solve the problem.
Il giorno 12/ago/2010, alle ore 10.51, Simone Caldana ha scritto:
Il giorno 11/ago/2010, alle ore 20.31, Simone Caldana ha scritto:
I am collecting port 67 traffic on the dhcp server since this afternoon. I hope to be able to find out more.
further testing revealed that dhclient really asks for the wrong ip (but always sends the client-identifier), and this is due to the poisoned leases file. I am now using /dev/null as leases file for all the virtual dhclients, but I wonder if this disables the entire "keep the old lease" system (which I don't really need) or it simply stays only in memory, which won't really solve the problem.
bug has been filed, patch has been proposed: https://bugzilla.redhat.com/show_bug.cgi?id=623953
Jeremiah Jinno discovered the problem and proposed a patch upstream to ISC:
https://lists.isc.org/mailman/htdig/dhcp-users/2010-June/011521.html