"The balloon driver allows guests to express to the hypervisor how much memory they require. The balloon driver allows the host to efficiently allocate memory to the guest and allow free memory to be allocated to other guests and processes.
Guests using the balloon driver can mark sections of the guest's RAM as not in use (balloon inflation). The hypervisor can free the memory and use the memory for other host processes or other guests on that host. When the guest requires the freed memory again, the hypervisor can reallocate RAM to the guest (balloon deflation). "
I don't see in this doc any reference to "balloon driver is to give the host system a way of recovering memory from the guest when the demands on the host's physical memory exceed the amount available"
That paragraph doesn't really make a ton of sense to me. The way I understood balloon drivers is that the control over their inflation and deflation is done by the host and not by the guest. The host forces the balloon to inflate when it (the host) is under memory pressure and deflates when the pressure eases.
Example, take a host with 12GB of RAM and 4 Guest VM's each with 4GB RAM allocated. When two of those guests are turned on, the host is only using 8GB of RAM for the guests so there's no memory pressure and the balloon drivers in each guest remain deflated. Now bring those extra two guests online and our memory pressure on the host is up to 16GB, a 4GB shortfall. Now I would expect to see the balloon drivers in each guest inflate to recover that 4GB shortfall.
RedHat's statement above implies that the guests control how and when that balloon inflates which seems to contradict what I understood.