Last post on this, sorta solved.
original post: -----------------------------------
I have a computer I am using to host a virtual machine. Centos 6, for both, 64 bit. The host machine's network connection seems fine. No problems. Trying to access the virtual machine is usually fine. but then, poof, ssh, http, ftp, all lose connection for about a minute. Then they come back up. I looked in all the logs on both machines, could find nothing, but not sure where to look. My question, would this be a setting on the VM as a webserver, some new centos 6 setting that just times out network when not in use? Or something that I did when I bonded my eth ports and bridged them? The bond covers the two onboard eth ports and one port from an add on network card. It is intermittent, seems to happen whenever, but service network restart on the webserver seems to fix it immediately, but it also just fixes itself too. is there some setting with centos 6 that must be changed to allow constant 'uptime' of the network? ------------------------------
I took out the bond and found that was the issue. works fine without it. However, I also brought up a second vm and found something interesting.
1- with two vms, only one failed, the other stayed up 100% of the time. 2- second NIC card was not working well, but even taken out did not solve issue. 3- pinging system I found the vm that brought up vnet0 had the exact same pings as the host, the vnet1 vm had double. 4- no matter what order the vms were brought up, whichever got assigned libvirts vnet0 would fail, the other would not fail at all.
5- the ping of the host and the vnet0 assigned VM were exactly the same every time, the vnet1 vm was a little more than double that (12ms versus 28ms).
6- the host never lost connection, but is using the same bridge and bond to connect.
It has become logical in my thought process that the host and the first vm are somehow in conflict, and the host wins....via the bond software. It seems like with vms, the host should not be connected to the bond and that might work. But I am way too over this to test it out.
Sharing the bridge and the bond makes me feel the first virtual machine brought up, assigned libvirt's vnet1 eventually lost some arp contest to the host.
A third vm was added, never failed if not brought up first, and had the same ping rate as the vnet1, double the host and the vnet0 virtual machine.
What is causing that is beyond my knowledge and is for experts on libivrt's vnet system, bond software, and possibly eth bridges. All I know is the host never failed even though it was using the same bond/bridge and maybe that is the real issue. In a vm environment maybe the host should have its on connection NOT on the bond shared by the VMs?
Using physical bridges may have confused bond with that first vm coming up.....
well, that is a long couple weeks work. RIght now I am just going to assign the eths direct to the bridge and forget bonding as really bad nightmare. I hope someone tests this out a bit and comes up with a brilliant yet really techy solution.