Look at pound: http://www.apsis.ch/pound/
If you are concerned about traffic volume, you might consider running squid as a transparent proxy in front of pound. I.e.:
request -> squid -> pound -> apache
Where squid will return the response for everything marked as cacheable and still fresh; and pound will take care of load balancing to apache. (Pound can inspect/insert cookies to send visitors to the same back-end node on subsequent requests.) On some of our setups, squid responds to 98% of the requests coming in, and is able to respond to an extremely insane high volume of requests. Other list users might be able to provide good stats as to what sort of volume they can support. (I'd be curious to hear what others have seen...)
For HA: - 2 instances of squid, active/standby or active/active (i.e. two IP address in DNS for the public hostname, and have each squid instance pick up the others during failure). - 2 instances of pound, active/standby - N instances of apache
Re: replication of content on your apache nodes, another poster suggested drbd. From my understanding, I do not think this is possible, since only one node can mount the drbd volume at a time. If you have shared data that needs to be seen across apache nodes, either stick it in SQL or mount an NFS volume across the nodes. (But then you have NFS in the picture, which might not be so good.)
If your apache code is constant, then have a master apache node and write a shell script that runs rsync to push code changes out to the other instances.
It's hard to get very specific about what's best for your setup without know the specifics of things like the data sync needs on the apache nodes, so take all of this with a grain of salt -- or as a default starting place.
best, Jeff