[CentOS] DHCP problem with virtual interfaces

Wed Aug 11 16:18:40 UTC 2010
Simone Caldana <simone.caldana at criticalpath.net>

Hello,

After asking around in #centos and other linux related IRC channels, here I am, bugging all of you.

brace for the long post. (tldr: dhclient loses virtual interface ips)

I set up a system to deploy statically assigned IPs to physical and virtual interfaces to a number of (virtual) machines.

I have a DHCPD with an entry for each virtual ip in the form of

host eth0-1.virt1.test.it { option dhcp-client-identifier "00:50:56:00:84:00-eth0:1"; fixed-address 10.192.52.132; }

where 00:50:56:00:84:00 is the physical mac address, but that's irrelevant.

on the client machine I got inside sysconfig's system and wrote a custom ifup-local-eth0 that launches one dhclient per virtual interface, like:

/sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0:1.leases -pf /var/run/dhclient-eth0:1.pid eth0:1

always on the client in dhclient.conf I have entries like:

interface "eth0:1" {
send dhcp-client-identifier "00:50:56:00:84:00-eth0:1" ;
}

at service network start everyone is happy, virtual interfaces gets configured and everything works.

after a while (hours, days) suddenly _some_ (not alway all of them, sometimes 1 out of 10, sometimes 9 out of 10) of the virtual IPs disappear.

in the DHCPD log files I see things like (not the same machine, because on that I didn't have the issue recently): 

Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.103 from 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPACK on 10.192.52.103 to 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.103 from 00:50:56:00:37:00 via eth1: lease 10.192.52.103 unavailable.
Aug 11 16:39:52 cp1 dhcpd: DHCPNAK on 10.192.52.103 to 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPDISCOVER from 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPOFFER on 10.192.52.102 to 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.102 (10.192.50.21) from 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPACK on 10.192.52.102 to 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPDISCOVER from 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPOFFER on 10.192.52.103 to 00:50:56:00:37:00 via eth1
Aug 11 16:39:52 cp1 dhcpd: DHCPREQUEST for 10.192.52.102 (10.192.50.21) from 00:50:56:00:37:00 via eth1: lease 10.192.52.102 unavailable.
Aug 11 16:39:52 cp1 dhcpd: DHCPNAK on 10.192.52.102 to 00:50:56:00:37:00 via eth1

Also, in the clients' leases file I find leases for ips that should go to othe dhclients on the same machine.

My guesses are either:

a) dhclient doesn't always send dhcp-client-identifier and thus the server can't verify the IP they claim is theirs. why this doesn't happens at every renew I don't know (I tried using very short lease times, and wasn't able to reproduce the issue).

b) dhclients steals DHCPOFFER replies from each other, leading the stealing DHCP take charge of the stolen from IP and having its original one fade into oblivion.

to work around b) I tried to point the N dhclients to the same lease file, but this still lead to NAKs (albeit not lost IPs) _and_ a strangely looking leases file, with empty lines but apparently no significant corruption.

software versions:

# dhcpd -V
Internet Systems Consortium DHCP Server V3.0.5-RedHat

# dhclient -V
Internet Systems Consortium DHCP Client V3.0.5-RedHat

the problem happened also using DHCP Server 3.0.1 on an older CentOS 4.5

I'm looking for ideas to better pinpoint the problem, alternatives, anything.

possible FAQs:
Q) do you have dynamic ip pools on the DHCP server?
A) yes, but not on this subnet. This subnet only has statically assigned ips, both physical (mapped with MAC addresses) and virtual (mapped with dhcp-client-id

Q) do you see IPs assigned on the wrong machine?
A) never happened.

Q) do service network restart fix the issue?
A) yes, always, and yes, temporarily. This makes me think the system is basically right and this may be a bug.

Q) do you have network issues?
A) no, and this doesn't seem to happen in moments where a network related problem may be likely

Q) why don't you use sysconfig virtual interfaces configuration files?
A) they doesn't support DHCP, only static IPs

Q) did you try building a newer dhclient/dhcpd?
A) I built 4.2's dhclient from ISC sources but the pair dhclient/dhclient-script isn't a drop in replacement for CentOS.

Q) why don't you use static IPs, if they are static in DHCP?
A) I want to be able to reconfigure the networks without editing tons of files on tons of machines. I currently have tens of these VMs deployed and the number is likely to increase alot in the future. The deployment and configuration is automatic using a central configuration.



Attached a txt with the tcpdump of the above log extraction.

Thanks.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tcpdump.txt
URL: <http://lists.centos.org/pipermail/centos/attachments/20100811/faec0ed8/attachment-0003.txt>
-------------- next part --------------

-- 
Simone Caldana
Senior Consultant
Critical Path
via Cuniberti 58, 10100 Torino, Italia
+39 011 4513811 (Direct)
+39 011 4513825 (Fax)
simone.caldana at criticalpath.net
http://www.cp.net/

Critical Path
A global leader in digital communications