[CentOS] squid HA failover?

> Yes, I normally want one server handling the full load to maximize the
> cache hits.  But the other one should be up and running.

So, active/standby. Easier config. Squid won't even be aware that  
heartbeat is running; just keep it running on both servers all the time.

See my install notes at bottom.

> And, because
> this is already in production as a mid-level cache working behind a
> loadbalancer, I would like to be able to keep answering on the M or N
> addresses, gradually reconfiguring the web servers in the farm to use
> address O instead of the loadbalancer VIP address.

Go for it. It'll work fine. You could get fancy and switch primary  
interface from M to O, and make M the VIP. Depends if you can accept  
the ~30 seconds of downtime and your tolerance for risk.

> I got the impression
> from some of the docs/tutorials that it was a bad idea to access the  
> M/N
> addresses directly.

In your case, it's only "bad" if M/N go down.

> Or does that only apply to services where it is
> important to only have one instance alive at a time or where you are
> replicating data?

Depends on the service and replication setup. If you had master/slave  
MySQL and connected to the slave, you'd see amnesia on the master.  
(That setup wouldn't allow for fail-back though, so, probably wouldn't  
see it.) Things like drbd protect you from concurrent mounting.

> Even after converting all of the farm to use the new
> address, I'll still want to be able to monitor the backup server to be
> sure it is still healthy.

Yup. And you'll want to monitor the active node and force-failover if  
the service fails. My config below doesn't take this into  
consideration; maybe other list lurkers can correct it to be better.  
The quick and dirty fix is to for each node to check if it is active,  
and if it is, if squid is not active, to then run 'service heartbeat  
restart' to failover to the other node. (I.e. once-a-minute cron job.)  
Not as pretty as it should be.

best,
Jeff

Replace 1.2.3.4 with your VIP ip address, and a.example.com and  
b.example.com with your FQDN hostnames.

server A ("a.example.com"):
	yum -y install heartbeat
	chkconfig --add heartbeat
	chkconfig --level 345 heartbeat on

	echo 'a.example.com IPaddr::1.2.3.4' > /etc/ha.d/haresources
	echo "node a.example.com" > /etc/ha.d/ha.cf
	echo "node b.example.com" >> /etc/ha.d/ha.cf
	echo "udpport 9000" >> /etc/ha.d/ha.cf
	echo "bcast bond0" >> /etc/ha.d/ha.cf
	echo "auto_failback off" >> /etc/ha.d/ha.cf
	echo "logfile /var/log/ha-log" >> /etc/ha.d/ha.cf
	echo "logfacility     local0" >> /etc/ha.d/ha.cf
	echo "auth 1" > /etc/ha.d/authkeys
	echo "1 crc" >> /etc/ha.d/authkeys
	chmod go-rwx /etc/ha.d/authkeys

server B ("b.example.com"):
	yum -y install heartbeat
	chkconfig --add heartbeat
	chkconfig --level 345 heartbeat on

	echo 'a.example.com IPaddr::1.2.3.4' > /etc/ha.d/haresources # yes,  
"a" again - that's the default host to run the service
	echo "node a.example.com" > /etc/ha.d/ha.cf
	echo "node b.example.com" >> /etc/ha.d/ha.cf
	echo "udpport 9000" >> /etc/ha.d/ha.cf
	echo "bcast bond0" >> /etc/ha.d/ha.cf
	echo "auto_failback off" >> /etc/ha.d/ha.cf
	echo "logfile /var/log/ha-log" >> /etc/ha.d/ha.cf
	echo "logfacility     local0" >> /etc/ha.d/ha.cf
	echo "auth 1" > /etc/ha.d/authkeys
	echo "1 crc" >> /etc/ha.d/authkeys
	chmod go-rwx /etc/ha.d/authkeys

	# This assumes:
	# 1) your network is bond0, not eth0
	# 2) you are on a private network where you don't care about  
security, otherwise see http://www.linux-ha.org/authkeys
	# Make sure udpport isn't in use by any other instances; or, use mcast.

On server A:
	service heartbeat start
	# Then, check your log files (/var/log/ha-log and /var/log/messages).
	# Ping the virtual IP.

On server B:
	service heartbeat start
	# check your log files

On server A:
	service heartbeat restart

On server B:
	ifconfig -a
	# Check if the interface is now runing on server B.

You can monitor current active node with arp -- the mac address will  
switch to match the physical interface that the VIP is running on.