xenbr0 isn't created anymore

List overview All Threads
Download

newer

older

Installing Xen 3.2 from Xen.org...

xm new and plain text config files

Kai Schaetzl

27 Mar 2008 27 Mar '08

3:01 p.m.

Somehow I managed to disable the creation of xenbr0 on boot-up of the host system. CentOS 5 with standard CentOS Xen. I changed all Xen VMs to use xenbr0 instead of virbr0 and disabled virtlibd. Works fine. But when I later restarted the machine I found that all networking for guests had gone. On inspection there's no xenbr0 created anymore. I can get it up by stopping network (or eth0) and then running /etc/xen/scripts/network-bridge start xend-config.sxp still has (network-script network-bridge), logging shows no problems. It seems that the network-bridge script simply doesn't run, but why? I can't see any init script or so that might run it, so I assume it's xend doing that, but there's no error in any log.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Show replies by date

Ross S. W. Walker

27 Mar 27 Mar

3:32 p.m.

Kai Schaetzl wrote:

...

Somehow I managed to disable the creation of xenbr0 on boot-up of the host system. CentOS 5 with standard CentOS Xen. I changed all Xen VMs to use xenbr0 instead of virbr0 and disabled virtlibd. Works fine. But when I later restarted the machine I found that all networking for guests had gone. On inspection there's no xenbr0 created anymore. I can get it up by stopping network (or eth0) and then running /etc/xen/scripts/network-bridge start xend-config.sxp still has (network-script network-bridge), logging shows no problems. It seems that the network-bridge script simply doesn't run, but why? I can't see any init script or so that might run it, so I assume it's xend doing that, but there's no error in any log.

Can you post the output of 'chkconfig --list'?

Attach a copy of /etc/xen/xend-config.sxp?

-Ross

______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.

Kai Schaetzl

3:53 p.m.

Ross S. W. Walker wrote on Thu, 27 Mar 2008 11:32:56 -0400:

...

Can you post the output of 'chkconfig --list'?

Attach a copy of /etc/xen/xend-config.sxp?

That's both rather long for a mailing list, isn't it?

There's no difference from the default xend-config.sxp other than that I added a keymap directive and that was before the problem started.

I deactivated some services in firstboot (and then firstboot) at about the same time I deactivated libvirtd. But I don't see any service that might be connected to this. Are there specific ones that don't start with xen that I should be looking for?

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

4:35 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Thu, 27 Mar 2008 11:32:56 -0400:

...
Can you post the output of 'chkconfig --list'?

Attach a copy of /etc/xen/xend-config.sxp?

That's both rather long for a mailing list, isn't it?

There's no difference from the default xend-config.sxp other than that I added a keymap directive and that was before the problem started.

I deactivated some services in firstboot (and then firstboot) at about the same time I deactivated libvirtd. But I don't see any service that might be connected to this. Are there specific ones that don't start with xen that I should be looking for?

Ok, well I wanted to make sure xend is running at the right runlevels and see if there is anything else weird set to start that shouldn't.

I also see you are having other problems, with portmap, maybe the two are related and they both sound like it has to do with the network.

Did you do anything on the network side around the time of the failure?

Maybe there is an interface definition for virbr0 that is left around but since libvirt is disabled, a bridge isn't activated for it to apply to?

-Ross

Kai Schaetzl

6:37 p.m.

Ross S. W. Walker wrote on Thu, 27 Mar 2008 12:35:17 -0400:

...

Ok, well I wanted to make sure xend is running at the right runlevels and see if there is anything else weird set to start that shouldn't.

It's set to the standard runlevels 2-5

...

I also see you are having other problems, with portmap, maybe the two are related and they both sound like it has to do with the network.

Interestingly, I got portmap fixed by way of trying to fix the xenbr problem. But don't know how it got fixed, you'll see in on the other list.

...

Did you do anything on the network side around the time of the failure?

Definitely, no. I disabled the RH-firewall and a few services like cups that I don't need. No changes to any network interfaces. From comparing the boot log messages the network-bridge doesn't ever seem to run now.

...

Maybe there is an interface definition for virbr0 that is left around but since libvirt is disabled, a bridge isn't activated for it to apply to?

If I startup libvirtd only virbr0 gets created, nothing else. I enabled libvirtd at boot time again, but this doesn't help. The difference is only the following:

Mar 27 17:57:48 mambo kernel: Bridge firewalling registered Mar 27 17:57:48 mambo kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Mar 27 17:57:49 mambo kernel: Netfilter messages via NETLINK v0.30. Mar 27 17:57:49 mambo kernel: ip_conntrack version 2.4 (8192 buckets, 65536 max) - 228 bytes per conntrack

Mar 27 17:57:50 mambo dnsmasq[3105]: started, version 2.39 cachesize 150 <more dnsmasq stuff skipped>

so, the problem cannot be connected to libvirtd being present or not.

This is how it looks if everything goes well during boot:

Feb 24 17:31:14 mambo xenstored: Checking store ... Feb 24 17:31:14 mambo xenstored: Checking store complete. Feb 24 17:31:14 mambo xenstored: Checking store ... Feb 24 17:31:14 mambo xenstored: Checking store complete. Feb 24 17:31:15 mambo dhcpd: receive_packet failed on eth0: Network is down

Feb 24 17:31:15 mambo kernel: device vif0.0 entered promiscuous mode Feb 24 17:31:15 mambo kernel: xenbr0: port 1(vif0.0) entering learning state Feb 24 17:31:15 mambo kernel: peth0: link up, 100Mbps, full-duplex, lpa 0x45E1 Feb 24 17:31:15 mambo kernel: xenbr0: topology change detected, propagating Feb 24 17:31:15 mambo kernel: xenbr0: port 1(vif0.0) entering forwarding state Feb 24 17:31:15 mambo kernel: peth0: Promiscuous mode enabled. Feb 24 17:31:15 mambo kernel: device peth0 entered promiscuous mode Feb 24 17:31:16 mambo kernel: xenbr0: port 2(peth0) entering learning state Feb 24 17:31:16 mambo kernel: xenbr0: topology change detected, propagating Feb 24 17:31:16 mambo kernel: xenbr0: port 2(peth0) entering forwarding state

Feb 24 17:31:21 mambo kernel: tap tap-1-51712: 2 getting info Feb 24 17:37:09 mambo kernel: tap tap-2-51712: 2 getting info Feb 24 17:37:09 mambo kernel: device vif2.0 entered promiscuous mode Feb 24 17:37:09 mambo kernel: ADDRCONF(NETDEV_UP): vif2.0: link is not ready Feb 24 17:37:13 mambo kernel: blktap: ring-ref 8, event-channel 8, protocol 1 (x86_32-abi) Feb 24 17:37:35 mambo kernel: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready Feb 24 17:37:35 mambo kernel: xenbr0: port 3(vif2.0) entering learning state Feb 24 17:37:35 mambo kernel: xenbr0: topology change detected, propagating Feb 24 17:37:35 mambo kernel: xenbr0: port 3(vif2.0) entering forwarding state

and this seems to be the first time where it started failing: Mar 24 20:35:11 mambo logger: /etc/xen/scripts/vif-bridge: Could not find bridge device xenbr0 Mar 24 20:35:16 mambo kernel: blkback: ring-ref 8, event-channel 6, protocol 1 (x86_32-abi) Mar 24 20:35:16 mambo logger: /etc/xen/scripts/vif-bridge: Could not find bridge device xenbr0 Mar 24 20:35:18 mambo logger: /etc/xen/scripts/vif-bridge: Could not find bridge device xenbr0 Mar 24 20:35:23 mambo kernel: blkback: ring-ref 8, event-channel 7, protocol 1 (x86_32-abi) Mar 24 20:35:23 mambo logger: /etc/xen/scripts/vif-bridge: Could not find bridge device xenbr0 Mar 24 20:35:24 mambo kernel: tap tap-3-51712: 2 getting info Mar 24 20:35:24 mambo udevd-event[3800]: udev_node_mknod: mknod (/dev/xen/blktap1, 020600, 253, 1) failed: File exists Mar 24 20:35:25 mambo logger: /etc/xen/scripts/vif-bridge: Could not find bridge device xenbr0 Mar 24 20:35:29 mambo kernel: blktap: ring-ref 8, event-channel 6, protocol 1 (x86_32-abi) Mar 24 20:35:29 mambo logger: /etc/xen/scripts/vif-bridge: Could not find bridge device xenbr0

(the blkback/blktap stuff seems to be related, but I don't know what it means.)

The latter happens when there are DomUs to restore, without them it's just the four lines about xenstored. Any up notices for xenbr0 etc. are completely missing.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

28 Mar 28 Mar

1:44 p.m.

Kai Schaetzl wrote:

...

Somehow I managed to disable the creation of xenbr0 on boot-up of the host system. CentOS 5 with standard CentOS Xen. I changed all Xen VMs to use xenbr0 instead of virbr0 and disabled virtlibd. Works fine. But when I later restarted the machine I found that all networking for guests had gone. On inspection there's no xenbr0 created anymore. I can get it up by stopping network (or eth0) and then running /etc/xen/scripts/network-bridge start xend-config.sxp still has (network-script network-bridge), logging shows no problems. It seems that the network-bridge script simply doesn't run, but why? I can't see any init script or so that might run it, so I assume it's xend doing that, but there's no error in any log.

Kai,

Snooping around I found this in /etc/xen/qemu-ifup:

# # Old style bridge setup with netloop, used to have a bridge name # of xenbrX, enslaving pethX and vif0.X, and then configuring # eth0. # # New style bridge setup does not use netloop, so the bridge name # is ethX and the physical device is enslaved pethX # # So if... # # - User asks for xenbrX # - AND xenbrX doesn't exist # - AND there is a ethX device which is a bridge # # ..then we translate xenbrX to ethX # # This lets old config files work without modification

So it turns out Xen networking changed in current releases and they never updated their Wiki.

Now I am not sure how the vif0.X and vethX interfaces fit into the picture here... Maybe these are no longer used?

vifX.Y interfaces are still used though for linking domU interfaces with the physical interface.

-Ross

Kai Schaetzl

3:27 p.m.

Ross S. W. Walker wrote on Fri, 28 Mar 2008 09:44:44 -0400:

...

Snooping around I found this in /etc/xen/qemu-ifup:

Interesting. That is the one from Xen 3.2 I suppose? It's not what I have here with the mint CentOS 5.1 Xen.

I just have:

echo 'config qemu network with xen bridge for ' $*

ifconfig $1 0.0.0.0 up

brctl addif $2 $1

That might be used like "qemu-ifup xenbr0 eth0" or so I guess. I don't know if it would work but it sure never gets run (or if it runs it doesn't work). There's no "config qemu network with xen bridge" in any of my logs, so I'd say it never runs. And it should never run as I'm not using qemu. It might indeed only be there for qemu to use.

...

# # Old style bridge setup with netloop, used to have a bridge name # of xenbrX, enslaving pethX and vif0.X, and then configuring # eth0. # # New style bridge setup does not use netloop, so the bridge name # is ethX and the physical device is enslaved pethX

Hm, but that sounds like what they do currently. peth0 is supposed to be the physical device. But the bridge name is not eth0. So, it might be indeed a new way. While doing research I saw a number of complaints about their current way of doing it and saw much easier ways. So maybe they changed it in this manner.

...

# # So if... # # - User asks for xenbrX # - AND xenbrX doesn't exist # - AND there is a ethX device which is a bridge # # ..then we translate xenbrX to ethX

That would probably cut it.

...

Now I am not sure how the vif0.X and vethX interfaces fit into the picture here... Maybe these are no longer used?

Still there I would think, just that xenbr0 (after creation) bridges to eth0 and not peth0. I suppose. In ifconfig it will just look the same as before. I suppose. Again.

I found a cure for the problem yesterday evening. That bridge-network script is crap. It depends on certain output from "ip route list". What format do you get as the last line of a "ip route list"?

default via 192.168.1.1 dev eth0 src 192.168.1.231

default via 192.168.1.1 dev eth0

I get this appended src on any of my 5.1 setups but not on my 4.6 setups. I don't have a 5.0 for comparison handy. The date of the ip file is from March 2007, so I'd say it didn't get updated recently. But maybe some library that is involved in this got updated. As it used to work until some days ago either some update stopped it working or I somehow stopped the qemu-ifup (in case it ever really fixed this) from running or from running correctly. Or yet something else ... This is so hard to troubleshoot because those scripts won't add any logging to the normal syslog. If at all logging goes to xend.log or xend-debug.log and doesn't have any date/time attached, so it's easily mistaken as an error that occurred from running the script manually (which I did often enough).

Here's the beginning of the code in network-bridge:

#vifnum=${vifnum:-$(ip route list | awk '/^default / { print $NF }' | sed 's/^[^0-9]*//')} vifnum=${vifnum:-0} #bridge=${bridge:-xenbr${vifnum}} bridge=xenbr0 #netdev=${netdev:-eth${vifnum}} netdev=eth0 antispoof=${antispoof:-no}

I added the comments and the two lines that set bridge and netdev to fixed values. Now it works. Without this it would get "192.168.1.231" as $vifnum and try to use devices like eth192.168.1.231 and xenbr192.168.1.231. With the old output of "ip route list" it would grab "0" instead of "192.168.1.231" and thus use the correct interface names. With this change I can now start and stop xend and it will take up and down the xenbr0 and connected devices correctly. Before it would just silently fail. (Or work for starting only once I killed eth0 as then it would take the default 0.)

Is this code still the same in your version of the script? What's the output of the /etc/xen/scripts/network-bridge status? It should normally show the bridged devices and two layout versions of the routing table. Before the fix it was trying to display "eth192.168.1.231" and found that it doesn't exist, of course.

I'm tempted to try out the Xen 3.2. Is it what can be found at http://xen.org/download/? What makes me wary is this last paragraph on the readme:

...

After installation of the binary packages, some adjustment to the bootloader (grub) configuration will probably be necessary.

What do they mean? It does not install a new kernel, doesn't it?

Did you notice any improvements with this package? I remember you wrote somewhere you can now run 32bit DomUs in a 64bit Dom0 stable. Anything else? Any disadvantages?

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.co

Ross S. W. Walker

4:20 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Fri, 28 Mar 2008 09:44:44 -0400:

...
Snooping around I found this in /etc/xen/qemu-ifup:

Interesting. That is the one from Xen 3.2 I suppose? It's not what I have here with the mint CentOS 5.1 Xen.

I just have:

echo 'config qemu network with xen bridge for ' $*

ifconfig $1 0.0.0.0 up

brctl addif $2 $1

That might be used like "qemu-ifup xenbr0 eth0" or so I guess. I don't know if it would work but it sure never gets run (or if it runs it doesn't work). There's no "config qemu network with xen bridge" in any of my logs, so I'd say it never runs. And it should never run as I'm not using qemu. It might indeed only be there for qemu to use.

Hmmm, how about in the /xen/scripts/vif-bridge script? Does it provide those comments there?

It may be that the implementation of Xen in CentOS depends on the libvirtd service installed and running and all VMs running off of virbr0?

...

...
# # Old style bridge setup with netloop, used to have a bridge name # of xenbrX, enslaving pethX and vif0.X, and then configuring # eth0. # # New style bridge setup does not use netloop, so the bridge name # is ethX and the physical device is enslaved pethX

Hm, but that sounds like what they do currently. peth0 is supposed to be the physical device. But the bridge name is not eth0. So, it might be indeed a new way. While doing research I saw a number of complaints about their current way of doing it and saw much easier ways. So maybe they changed it in this manner.

Here is the output of 'brctl show' on my box:

bridge name bridge id STP enabled interfaces eth0 8000.00188b717d72 no vif2.0 tap0 peth0

On the older Xen it would look like this:

bridge name bridge id STP enabled interfaces xenbr0 8000.00188b717d72 no vif0.0 vif2.0 tap0 peth0

Then vif0.0 would be netlooped to eth0, so it is much simplier now.

...

...
# # So if... # # - User asks for xenbrX # - AND xenbrX doesn't exist # - AND there is a ethX device which is a bridge # # ..then we translate xenbrX to ethX

That would probably cut it.

...
Now I am not sure how the vif0.X and vethX interfaces fit into the picture here... Maybe these are no longer used?

Still there I would think, just that xenbr0 (after creation) bridges to eth0 and not peth0. I suppose. In ifconfig it will just look the same as before. I suppose. Again.

Actually you misread, xenbr0 was removed, now eth0 is the bridge itself.

...

I found a cure for the problem yesterday evening. That bridge-network script is crap. It depends on certain output from "ip route list". What format do you get as the last line of a "ip route list"?

default via 192.168.1.1 dev eth0 src 192.168.1.231

or

default via 192.168.1.1 dev eth0

The second one:

default via 10.1.3.1 dev eth0

...

I get this appended src on any of my 5.1 setups but not on my 4.6 setups. I don't have a 5.0 for comparison handy. The date of the ip file is from March 2007, so I'd say it didn't get updated recently. But maybe some library that is involved in this got updated. As it used to work until some days ago either some update stopped it working or I somehow stopped the qemu-ifup (in case it ever really fixed this) from running or from running correctly. Or yet something else ...

I believe you get that src 192.168.1.231 address if that route is discovered via ICMP router discovery. If it was given to you in DHCP or if you specified it in your ifcfg script with GATEWAY= then that probably wouldn't appear.

...

This is so hard to troubleshoot because those scripts won't add any logging to the normal syslog. If at all logging goes to xend.log or xend-debug.log and doesn't have any date/time attached, so it's easily mistaken as an error that occurred from running the script manually (which I did often enough).

Not only that, but they appear to be Xen 3.0.3 scripts... which are quite old.

...

Here's the beginning of the code in network-bridge:

#vifnum=${vifnum:-$(ip route list | awk '/^default / { print $NF }' | sed 's/^[^0-9]*//')} vifnum=${vifnum:-0} #bridge=${bridge:-xenbr${vifnum}} bridge=xenbr0 #netdev=${netdev:-eth${vifnum}} netdev=eth0 antispoof=${antispoof:-no}

I added the comments and the two lines that set bridge and netdev to fixed values. Now it works. Without this it would get "192.168.1.231" as $vifnum and try to use devices like eth192.168.1.231 and xenbr192.168.1.231. With the old output of "ip route list" it would grab "0" instead of "192.168.1.231" and thus use the correct interface names. With this change I can now start and stop xend and it will take up and down the xenbr0 and connected devices correctly. Before it would just silently fail. (Or work for starting only once I killed eth0 as then it would take the default 0.)

Is this code still the same in your version of the script? What's the output of the /etc/xen/scripts/network-bridge status? It should normally show the bridged devices and two layout versions of the routing table. Before the fix it was trying to display "eth192.168.1.231" and found that it doesn't exist, of course.

The scripting in the Xen 3.2 is completely revamped, more modular and handles current iputils properly.

...

I'm tempted to try out the Xen 3.2. Is it what can be found at http://xen.org/download/? What makes me wary is this last paragraph on the readme:

...
After installation of the binary packages, some adjustment to the bootloader (grub) configuration will probably be necessary.

What do they mean? It does not install a new kernel, doesn't it?

The package doesn't even include the linux kernel. You will use the CentOS xen linux kernel, but everytime you install/upgrade the CentOS xen linux kernel you need to remember to edit your grub.conf to point the xen kernel to the xen 3.2 kernel instead of the xen 3.1 kernel that comes bundled with the CentOS xen linux kernel package.

...

Did you notice any improvements with this package? I remember you wrote somewhere you can now run 32bit DomUs in a 64bit Dom0 stable. Anything else? Any disadvantages?

Yes 32-bit domUs run well on 64-bit dom0s, scripting works better of course. You have full access to all the Xen VM management using xenstore, for example add VMs to the store with 'xm new <config>' then completely manage them through xm, xm start <name>, xm stop, and the VM will always be visible in 'xm list', of course if you have libvirt installed you can also use that too if you so desire.

-Ross

Kai Schaetzl

6:10 p.m.

Ross S. W. Walker wrote on Fri, 28 Mar 2008 12:20:54 -0400:

...

Hmmm, how about in the /xen/scripts/vif-bridge script? Does it provide those comments there?

No, none of the files contain "qemu".

...

It may be that the implementation of Xen in CentOS depends on the libvirtd service installed and running and all VMs running off of virbr0?

No, it worked fine after the upgrade (there was at least one VM using xenbr0). I didn't create new VMs with virt-manager after the upgrade for a while, I just cloned the config files. But then I started testing VMs with LVM volumes and create a whole new one with virt-manager. That was the first time I noticed that virbr0 gets used (and I started asking about it here). Something must be changing the ip route output and that is the reason why the xen bridging stopped working. I enabled portmap on that day, disabled a few services and removed the Java yum group. That should be it, more or less.

...

Here is the output of 'brctl show' on my box:

bridge name bridge id STP enabled interfaces eth0 8000.00188b717d72 no vif2.0 tap0 peth0

Yeah, that looks like the description. What is tap0, a tape backup system?

...

The second one:

default via 10.1.3.1 dev eth0

Hm.

...

I believe you get that src 192.168.1.231 address if that route is discovered via ICMP router discovery. If it was given to you in DHCP or if you specified it in your ifcfg script with GATEWAY= then that probably wouldn't appear.

Hm. I didn't change anything in that area. The IP is set statically. It has a static private (on eth0:0) and a static public IP address (on eth0). The gateway is set for the private IP on the firewall. There is no "discovery" AFAICS. And the other 5.1 (a Dom0) I compared is setup the same way just that the gateway is the public one. Could this be the cause, having different subnets? But, again, it's that way for some time ... Just trying to compare with a VM that uses only private DHCP addresses, but just find that DHCP currently fails on it. I see dhcpd offering an IP, but the VM doesn't take it anymore. Hm. There still must be a networking problem somewhere. My VMs with static IP work fine, though.

...

The package doesn't even include the linux kernel. You will use the CentOS xen linux kernel, but everytime you install/upgrade the CentOS xen linux kernel you need to remember to edit your grub.conf to point the xen kernel to the xen 3.2 kernel instead of the xen 3.1 kernel that comes bundled with the CentOS xen linux kernel package.

I don't understand this. I have to admit I didn't take much effort to understand or investigate how Linux+Xen works. If I boot a different kernel I use a different kernel, don't I? Is /xen.gz-2.6.18-53.1.14.el5 the hypervisor? So, if I change to the 3.2 kernel I still have the same 5.1 CentOS kernel running the OS, just a different hypervisor? So, I can still use any updates to the Linux kernel? Just, if there is an (security) issue with the xen kernel I would need to wait for an update?

...

Yes 32-bit domUs run well on 64-bit dom0s, scripting works better of course. You have full access to all the Xen VM management using xenstore, for example add VMs to the store with 'xm new <config>'

Don't know yet the advantages of xenstore, documentation doesn't reveal much about it and I surely don't want to program it's API ;-)

...

then completely manage them through xm, xm start <name>, xm stop, and the VM will always be visible in 'xm list'

well, tiny advantage ;-)

Thanks, maybe I'll try 3.2, but first I have to troubleshoot that new DHCP problem on the VMs :-(

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

6:40 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Fri, 28 Mar 2008 12:20:54 -0400:

...
Hmmm, how about in the /xen/scripts/vif-bridge script? Does it provide those comments there?

No, none of the files contain "qemu".

Wasn't necessarily talking qemu, just the comments about how the network bridging has changed...

...

...
It may be that the implementation of Xen in CentOS depends on the libvirtd service installed and running and all VMs running off of virbr0?

No, it worked fine after the upgrade (there was at least one VM using xenbr0). I didn't create new VMs with virt-manager after the upgrade for a while, I just cloned the config files. But then I started testing VMs with LVM volumes and create a whole new one with virt-manager. That was the first time I noticed that virbr0 gets used (and I started asking about it here). Something must be changing the ip route output and that is the reason why the xen bridging stopped working. I enabled portmap on that day, disabled a few services and removed the Java yum group. That should be it, more or less.

...
Here is the output of 'brctl show' on my box:

bridge name bridge id STP enabled interfaces eth0 8000.00188b717d72 no vif2.0 tap0 peth0

Yeah, that looks like the description. What is tap0, a tape backup system?

No, it's a network TAP device for layer2 access to the network bridge. I have libvirt installed it probably is coming from there, maybe so when it brings up virbr0 it can tap into eth0 bridge.

...

...
The second one:

default via 10.1.3.1 dev eth0

Hm.

...
I believe you get that src 192.168.1.231 address if that route is discovered via ICMP router discovery. If it was given to you in DHCP or if you specified it in your ifcfg script with GATEWAY= then that probably wouldn't appear.

Hm. I didn't change anything in that area. The IP is set statically. It has a static private (on eth0:0) and a static public IP address (on eth0). The gateway is set for the private IP on the firewall. There is no "discovery" AFAICS. And the other 5.1 (a Dom0) I compared is setup the same way just that the gateway is the public one.

Could this be the cause, having different subnets? But, again, it's that way for some time ...

Not sure what you mean? You mean you have your GATEWAY set to an IP outside your subnet? If so, then yes, your host is finding the default gateway for your subnet through router discovery and that explains the output. Set your interfaces GATEWAY to the IP address marked as the 'src' in ip route.

...

Just trying to compare with a VM that uses only private DHCP addresses, but just find that DHCP currently fails on it. I see dhcpd offering an IP, but the VM doesn't take it anymore. Hm. There still must be a networking problem somewhere. My VMs with static IP work fine, though.

Weird...

...

...
The package doesn't even include the linux kernel. You will use the CentOS xen linux kernel, but everytime you install/upgrade the CentOS xen linux kernel you need to remember to edit your grub.conf to point the xen kernel to the xen 3.2 kernel instead of the xen 3.1 kernel that comes bundled with the CentOS xen linux kernel package.

I don't understand this. I have to admit I didn't take much effort to understand or investigate how Linux+Xen works. If I boot a different kernel I use a different kernel, don't I? Is /xen.gz-2.6.18-53.1.14.el5 the hypervisor? So, if I change to the 3.2 kernel I still have the same 5.1 CentOS kernel running the OS, just a different hypervisor? So, I can still use any updates to the Linux kernel? Just, if there is an (security) issue with the xen kernel I would need to wait for an update?

Yes the hypervisor runs before the kernel is booted. Linux runs as a PV guest in domain0 with full access to the systems resources and manages those for the other domains.

...

...
Yes 32-bit domUs run well on 64-bit dom0s, scripting works better of course. You have full access to all the Xen VM management using xenstore, for example add VMs to the store with 'xm new <config>'

Don't know yet the advantages of xenstore, documentation doesn't reveal much about it and I surely don't want to program it's API ;-)

Well you don't really have to program the API, but it allows your VMs to exist within the Xen database and to be managed by third party applications/interfaces that support the Xen API.

...

...
then completely manage them through xm, xm start <name>, xm stop, and the VM will always be visible in 'xm list'

well, tiny advantage ;-)

Thanks, maybe I'll try 3.2, but first I have to troubleshoot that new DHCP problem on the VMs :-(

Good luck.

I would take a look at your primary interface setup, it sounds like it might be a little off.

-Ross

Kai Schaetzl

8:31 p.m.

Ross S. W. Walker wrote on Fri, 28 Mar 2008 14:40:42 -0400:

...

I have libvirt installed it probably is coming from there, maybe so when it brings up virbr0 it can tap into eth0 bridge.

not here with libvirt and xen from CentOS.

...

Not sure what you mean? You mean you have your GATEWAY set to an IP outside your subnet?

No, eth0 has a public and eth0:0 has a private IP address, both are statically set. So, I have a choice of gateway and I use the private gateway which is 192.168.1.1. DHCP offers addresses in a subset of this subnet.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

8:51 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Fri, 28 Mar 2008 14:40:42 -0400:

...
I have libvirt installed it probably is coming from there, maybe so when it brings up virbr0 it can tap into eth0 bridge.

not here with libvirt and xen from CentOS.

Hmm, must be a Xen 3.2 addition then. It seems to come into existence when I bring up a domain and disappears after I destroy it, so it's related to the networking of domains, but I'll need to look into this further.

...

...
Not sure what you mean? You mean you have your GATEWAY set to an IP outside your subnet?

No, eth0 has a public and eth0:0 has a private IP address, both are statically set. So, I have a choice of gateway and I use the private gateway which is 192.168.1.1. DHCP offers addresses in a subset of this subnet.

So you have multiple gateways setup with different weights?

If so that could also explain it. If your private IP is 192.168.1.231, based on your earlier post, that would make perfect sense.

Anyways, yes the older scripts will need to be fixed up to handle the iputils output and if you get them working well file a bug report with the fix.

-Ross

Kai Schaetzl

7:31 p.m.

...

DHCP currently fails on it.

That must be a problem with the xenbr0 bridging as it is done with Xen < 3.2. I think the packet with the DHCPOFFER doesn't reach the interface. If I change to virbr0 (with running libvirtd, of course) DHCP works. The difference between xenbr0 and virbr0 is that virbr0 is bound to an IP address and is the gateway of that net.

Ross, what do you use in the new xen networking for the vif = [ "bridge=virbr0" ] line, eth0 ? Then DHCP should work this way as well. Do you have any VM not using virbr0 and taking IP from DHCP this way? Yet another good argument for 3.2 then.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

8:12 p.m.

Kai Schaetzl wrote:

...

...
DHCP currently fails on it.

That must be a problem with the xenbr0 bridging as it is done with Xen 3.2. I think the packet with the DHCPOFFER doesn't reach the interface. If I change to virbr0 (with running libvirtd, of course) DHCP works. The difference between xenbr0 and virbr0 is that virbr0 is bound to an IP address and is the gateway of that net.

I think you are mixing the versions up or made a typo, Xen 3.2 no longer uses xenbr0, but bridges with the ethernet name. Also make sure iptables/ip6tables isn't still running in the background.

...

Ross, what do you use in the new xen networking for the vif = [ "bridge=virbr0" ] line, eth0 ? Then DHCP should work this way as well. Do you have any VM not using virbr0 and taking IP from DHCP this way? Yet another good argument for 3.2 then.

My configs still have xenbr0 listed, but the scripts will take any xenbr* and convert it to eth* if it exists and is a bridge. I don't have any problem with DHCP. I also have iptables currently disabled.

-Ross

Kai Schaetzl

30 Mar 30 Mar

8:56 p.m.

Ross S. W. Walker wrote on Fri, 28 Mar 2008 16:12:20 -0400:

...

I think you are mixing the versions up or made a typo,

yes, I meant to write "Xen < 3.2".

...

My configs still have xenbr0 listed, but the scripts will take any xenbr* and convert it to eth* if it exists and is a bridge. I don't have any problem with DHCP. I also have iptables currently disabled.

Before going to the new Xen I tried to use this explanation for building a bridge for xend, but it doesn't work for me at all :-( http://henning.schmiedehausen.org/wingnut-diaries/archives/86 Unfortunately, bridge interfaces are missing from the Red Hat Deployment Guide, I'm still looking for a guide about what can be used in bridge ifcfg files. What I see on the net looks completely different.

Oh, and you were absolutely right about the routing. I changed the default gateway to fit the public subnet and added a static route for the private subnet and now the additional src in ip route list is gone. Didn't change that the DHCPOFFER doesn't get back to the VM, though :-(

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

31 Mar 31 Mar

2:21 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Fri, 28 Mar 2008 16:12:20 -0400:

...
I think you are mixing the versions up or made a typo,

yes, I meant to write "Xen < 3.2".

...
My configs still have xenbr0 listed, but the scripts will take any xenbr* and convert it to eth* if it exists and is a bridge. I don't have any problem with DHCP. I also have iptables currently disabled.

Before going to the new Xen I tried to use this explanation for building a bridge for xend, but it doesn't work for me at all :-( http://henning.schmiedehausen.org/wingnut-diaries/archives/86 Unfortunately, bridge interfaces are missing from the Red Hat Deployment Guide, I'm still looking for a guide about what can be used in bridge ifcfg files. What I see on the net looks completely different.

Oh, and you were absolutely right about the routing. I changed the default gateway to fit the public subnet and added a static route for the private subnet and now the additional src in ip route list is gone. Didn't change that the DHCPOFFER doesn't get back to the VM, though :-(

Kai,

Why not try tcpdump on the bridge interface and see if you can see the DHCPOFFER/DHCPACK and what MACs it used.

-Ross

Kai Schaetzl

6:41 p.m.

Ross S. W. Walker wrote on Mon, 31 Mar 2008 10:21:10 -0400:

...

Why not try tcpdump on the bridge interface and see if you can see the DHCPOFFER/DHCPACK and what MACs it used.

I hoped to avoid doing something I do only every few years ;-) I'm quite familiar with using Wireshark/Etheral on Windows, but I used tcpdump only once or so, ever. I remember I can use tcpdump logs with Wireshark, can't I?

Anyway, I just installed xen 3.2 and VMs are well. It didn't solve the DHCP problem, so I will check tcpdump soon. But I found a problem with python when I wanted to add a DomU to the xen managed domains. It seems the xmlproc library is missing. I tried to install python-xml (as is recommended for Debian) but there is none for CentOS. libxml2-python is already installed and the only other module with xml in the name is python-lxml which doesn't look like the one I need. Did you hit the same problem?

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

7:09 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Mon, 31 Mar 2008 10:21:10 -0400:

...
Why not try tcpdump on the bridge interface and see if you can see the DHCPOFFER/DHCPACK and what MACs it used.

I hoped to avoid doing something I do only every few years ;-) I'm quite familiar with using Wireshark/Etheral on Windows, but I used tcpdump only once or so, ever. I remember I can use tcpdump logs with Wireshark, can't I?

You can use the tcpdump logs in wireshark, or you can yum install wireshark and use that interactively right on the bridge.

...

Anyway, I just installed xen 3.2 and VMs are well. It didn't solve the DHCP problem, so I will check tcpdump soon. But I found a problem with python when I wanted to add a DomU to the xen managed domains. It seems the xmlproc library is missing. I tried to install python-xml (as is recommended for Debian) but there is none for CentOS. libxml2-python is already installed and the only other module with xml in the name is python-lxml which doesn't look like the one I need. Did you hit the same problem?

Yes, the problem is actually in the Xen API for 3.2, they added an option in the API, but didn't provide a default value if the client doesn't provide one, so it bombs.

I have a fix for it. I could send the whole src.rpm, but it is too much baggage for the list, so I have thrown in the 2 patches to be put in the SOURCE directory and an updated xen.spec file.

Rebuild the packages, update them then virt-manager and virt-install should work.

-Ross

Kai Schaetzl

9:23 p.m.

8EF9022A7066@MFG-NYC-EXCH2.mfg.prv> Reply-To: centos-virt@centos.org X-Rcpt-To: centos-virt@centos.org

Ross S. W. Walker wrote on Mon, 31 Mar 2008 15:09:37 -0400:

...

You can use the tcpdump logs in wireshark, or you can yum install wireshark and use that interactively right on the bridge.

tcpdump -i eth0 port 67 doesn't reveal much. The requests just look the same like from other clients. But I don't catch all replies it seems. The problem is that I catch replies for some clients and not for some other clients. But they still get an IP number, except for the ones in xen vms. So, it looks like the command doesn't catch all DHCP traffic. Do you have a better suggestion what to catch?

Apart from that the bridging looks like yours if I remember right. eth0 bridges to peth0 and vifx.0. No xenbr or virbr anymore. No tap devices, though. Looking thru man xm I found a dhcp = "dhcp" directive for the config file, but it didn't change anything.

...

Yes, the problem is actually in the Xen API for 3.2, they added an option in the API, but didn't provide a default value if the client doesn't provide one, so it bombs.

I have a fix for it.

Thanks, I have looked at the patches, but they seem to be for something different. I checked if I can create a new VM with virt-manager and this fails in the network device step. But I think that's yet another bug, we already discussed here, there's also a patch for that. No, my problem is different from both. I get an error when running "xm new <vmname>". Going by the short definition in the xm help output (they forgot to add new to man xm) and what you told about xenstore I deduced I would need to run this command to add an already existing VM config to the store so I can manage it. (If there's a different way ...). And this fails with a python trace that was already mentioned on the Xen lists and seems to indicate a missing python module:

ImportError: No module named xmlproc Here's why I think I need "python-xml" or something similar: http://www.google.de/search?num=30&hl=de&q=ImportError+xmlproc+xenso...

Also, this error seems to be quite old, so I would expect it being fixed in 3.2 rpms, which also points to an external source.

Didn't you get this for "xm new"?

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

9:45 p.m.

Kai Schaetzl wrote:

...

8EF9022A7066@MFG-NYC-EXCH2.mfg.prv> Reply-To: centos-virt@centos.org X-Rcpt-To: centos-virt@centos.org

Ross S. W. Walker wrote on Mon, 31 Mar 2008 15:09:37 -0400:

...
You can use the tcpdump logs in wireshark, or you can yum install wireshark and use that interactively right on the bridge.

tcpdump -i eth0 port 67 doesn't reveal much. The requests just look the same like from other clients. But I don't catch all replies it seems. The problem is that I catch replies for some clients and not for some other clients. But they still get an IP number, except for the ones in xen vms. So, it looks like the command doesn't catch all DHCP traffic. Do you have a better suggestion what to catch?

It's not all port 67, the DHCP client sends DHCPREQ via UDP port 67 to the broadcast address UDP port 68, the DHCP server responds with a DHCPOFFER from it's IP address UDP port 68 to the clients broadcast address UDP port 67.

...

Apart from that the bridging looks like yours if I remember right. eth0 bridges to peth0 and vifx.0. No xenbr or virbr anymore. No tap devices, though. Looking thru man xm I found a dhcp = "dhcp" directive for the config file, but it didn't change anything.

BTW I discovered that the tap devices are from qemu running in HVM mode. In HVM qemu does the network emulation and uses the kernel tun device for creating it's network interfaces.

...

...
Yes, the problem is actually in the Xen API for 3.2, they added an option in the API, but didn't provide a default value if the client doesn't provide one, so it bombs.

I have a fix for it.

Thanks, I have looked at the patches, but they seem to be for something different. I checked if I can create a new VM with virt-manager and this fails in the network device step. But I think that's yet another bug, we already discussed here, there's also a patch for that.

If you can create a VM with virt-manager, then you don't have Xen 3.2 installed or properly installed...

...

No, my problem is different from both.

I get an error when running "xm new <vmname>". Going by the short definition in the xm help output (they forgot to add new to man xm) and what you told about xenstore I deduced I would need to run this command to add an already existing VM config to the store so I can manage it. (If there's a different way ...). And this fails with a python trace that was already mentioned on the Xen lists and seems to indicate a missing python module:

ImportError: No module named xmlproc Here's why I think I need "python-xml" or something similar: http://www.google.de/search?num=30&hl=de&q=ImportError+xmlproc+xenso...

Also, this error seems to be quite old, so I would expect it being fixed in 3.2 rpms, which also points to an external source.

Didn't you get this for "xm new"?

I never encountered this error. If you upgraded to Xen 3.2 did you upgrade both the xen-3.2 and xen-libs-3.2 packages? Did you edit your grub config too to load xen-3.2 as well?

BTW xmlproc is handled completely in xend I believe, it all works fine on my host and I have no python-xml installed!

-Ross

Kai Schaetzl

1 Apr 1 Apr

1:12 p.m.

Ross S. W. Walker wrote on Mon, 31 Mar 2008 17:45:24 -0400:

...

It's not all port 67, the DHCP client sends DHCPREQ via UDP port 67 to the broadcast address UDP port 68, the DHCP server responds with a DHCPOFFER from it's IP address UDP port 68 to the clients broadcast address UDP port 67.

Ah, many thanks. Ok, what happens is that the request appears on all interfaces but the reply goes out on peth0 only. And that never reaches the DomU on vifx.0. If I start libvirtd and then kill dnsmasq (as I want dhcpd to answer) the reply propagates further and DomU takes the IP address. There's obviously something in the routing/forwarding that the startup of libvirtd changes. Output of iptables -L suggests it adds a forwarding rule to forward from anywhere to anywhere. But that's not true. This seems to be a limitation of the iptables -L output: it doesn't show the interface (and I don't see a way to change this, if I try to specify an interface I get an error that thi9s is not allowed with -L). Well, I saved iptables and from that it seems that all the forwarding rules apply to virbr0 only. As virbr0 isn't attached to anything anymore these rules should be useless. libvirtd also adds NAT rules, but I don't see how these could affect this either. So, there might be something else needed.

...

BTW I discovered that the tap devices are from qemu running in HVM mode. In HVM qemu does the network emulation and uses the kernel tun device for creating it's network interfaces.

Ah, I see. I'm not running fully virtualized.

...

...
Thanks, I have looked at the patches, but they seem to be for something different. I checked if I can create a new VM with virt-manager and this fails in the network device step. But I think that's yet another bug, we already discussed here, there's also a patch for that.

If you can create a VM with virt-manager, then you don't have Xen 3.2 installed or properly installed...

no, no, no. "I can create a new VM with virt-manager and this fails in the network device step". It cannot get any interfaces. I think there is a patch floating around for this, already mentioned on this list, but it's not the patch (es) you mentioned. Those two patches seem to apply to HVM only, so I shouldn't need them. If I wanted to create new VMs with virt-manager I would need to apply this other patch, though.

...

I never encountered this error.

I feared that :-(

If you upgraded to Xen 3.2 did you upgrade

...

both the xen-3.2 and xen-libs-3.2 packages? Did you edit your grub config too to load xen-3.2 as well?

Sure. I also installed xen-devel. Ahm, is that "xm new" supposed to do what I think or is it doing something else? I mean I understand that "xm new vmname" should take the VM of that name (identified by the existing config file of that name) and add it to the xenstore, so that I can "manage" from there. Meaning being able to use "start" (there's no stop?) and list it even when not running.

...

BTW xmlproc is handled completely in xend I believe, it all works fine on my host and I have no python-xml installed!

Hm, may need to subscribe to the xen list ;-)

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

2:16 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Mon, 31 Mar 2008 17:45:24 -0400:

...
It's not all port 67, the DHCP client sends DHCPREQ via UDP port 67 to the broadcast address UDP port 68, the DHCP server responds with a DHCPOFFER from it's IP address UDP port 68 to the clients broadcast address UDP port 67.

Ah, many thanks. Ok, what happens is that the request appears on all interfaces but the reply goes out on peth0 only. And that never reaches the DomU on vifx.0. If I start libvirtd and then kill dnsmasq (as I want dhcpd to answer) the reply propagates further and DomU takes the IP address. There's obviously something in the routing/forwarding that the startup of libvirtd changes. Output of iptables -L suggests it adds a forwarding rule to forward from anywhere to anywhere. But that's not true. This seems to be a limitation of the iptables -L output: it doesn't show the interface (and I don't see a way to change this, if I try to specify an interface I get an error that thi9s is not allowed with -L). Well, I saved iptables and from that it seems that all the forwarding rules apply to virbr0 only. As virbr0 isn't attached to anything anymore these rules should be useless. libvirtd also adds NAT rules, but I don't see how these could affect this either. So, there might be something else needed.

dnsmasq is going to filter out the incoming dhcp requests as it acts as a dhcp server itself. Try disabling dnsmasq, or move your VMs off of virbr0 onto xenbr0.

...

...
BTW I discovered that the tap devices are from qemu running in HVM mode. In HVM qemu does the network emulation and uses the kernel tun device for creating it's network interfaces.

Ah, I see. I'm not running fully virtualized.

...
...
Thanks, I have looked at the patches, but they seem to be for something different. I checked if I can create a new VM with virt-manager and this fails in the network device step. But I think that's yet another bug, we already discussed here, there's also a patch for that.

If you can create a VM with virt-manager, then you don't have Xen 3.2 installed or properly installed...

no, no, no. "I can create a new VM with virt-manager and this fails in the network device step". It cannot get any interfaces. I think there is a patch floating around for this, already mentioned on this list, but it's not the patch (es) you mentioned. Those two patches seem to apply to HVM only, so I shouldn't need them. If I wanted to create new VMs with virt-manager I would need to apply this other patch, though.

Ok...

...

...
I never encountered this error.

I feared that :-(

If you upgraded to Xen 3.2 did you upgrade

...
both the xen-3.2 and xen-libs-3.2 packages? Did you edit your grub config too to load xen-3.2 as well?

Sure. I also installed xen-devel. Ahm, is that "xm new" supposed to do what I think or is it doing something else? I mean I understand that "xm new vmname" should take the VM of that name (identified by the existing config file of that name) and add it to the xenstore, so that I can "manage" from there. Meaning being able to use "start" (there's no stop?) and list it even when not running.

Yes, 'xm new' adds a vm to the store and you can manage it via xm or virsh. There is 'save', 'shutdown', 'destroy', 'suspend' all having to deal with VM running state.

...

...
BTW xmlproc is handled completely in xend I believe, it all works fine on my host and I have no python-xml installed!

Hm, may need to subscribe to the xen list ;-)

I suggest it, there is definitely more traffic there.

-Ross

Kai Schaetzl

2:50 p.m.

<E2BB8074E E2BB8074E5500C42984D980D4BD78EF9022A706F@MFG-NYC-EXCH2.mfg.prv Reply-To: centos-virt@centos.org X-Rcpt-To: centos-virt@centos.org

Ross S. W. Walker wrote on Tue, 1 Apr 2008 10:16:38 -0400:

...

dnsmasq is going to filter out the incoming dhcp requests as it acts as a dhcp server itself. Try disabling dnsmasq, or move your VMs off of virbr0 onto xenbr0.

I wrote dnsmasq is killed then ;-) I started service libvirtd and then killed dnsmasq and made sure it wasn't running. Then I tried. And the virbr0 is not used anyway. However, something that libvirtd does seems to switch on some extra forwarding that helps the broadcast packet to travel from peth0 to eth0 which otherwise it would only do if it had an IP address target. I have now stopped libvirtd as well and it still works, even for a VM that I start after that (which means I can rule arp table out as its MAC address was unknown). And iptables does not show any forwarding rules once I stop libvirtd. The NAT stays active stopping libvirtd, but I killed it with iptables. Still it works. So, there must be something that switches this on. I'm sure if I reboot the host the problem is back.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

3:14 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Tue, 1 Apr 2008 10:16:38 -0400:

...
dnsmasq is going to filter out the incoming dhcp requests as it acts as a dhcp server itself. Try disabling dnsmasq, or move your VMs off of virbr0 onto xenbr0.

I wrote dnsmasq is killed then ;-) I started service libvirtd and then killed dnsmasq and made sure it wasn't running. Then I tried. And the virbr0 is not used anyway. However, something that libvirtd does seems to switch on some extra forwarding that helps the broadcast packet to travel from peth0 to eth0 which otherwise it would only do if it had an IP address target. I have now stopped libvirtd as well and it still works, even for a VM that I start after that (which means I can rule arp table out as its MAC address was unknown). And iptables does not show any forwarding rules once I stop libvirtd. The NAT stays active stopping libvirtd, but I killed it with iptables. Still it works. So, there must be something that switches this on. I'm sure if I reboot the host the problem is back.

Yeah, I would use xenbr0 (or eth0 in 3.2 parlance) as the bridge if you plan on using an external DHCP server and avoid the whole NAT and dnsmasq mess. I would probably use virbr0 as a nice virtual network only service, remove forwarding and NAT on it and keep it for internal traffic only.

-Ross

Kai Schaetzl

6:19 p.m.

Ross S. W. Walker wrote on Tue, 1 Apr 2008 11:14:58 -0400:

...

Yeah, I would use xenbr0 (or eth0 in 3.2 parlance) as the bridge if you plan on using an external DHCP server and avoid the whole NAT and dnsmasq mess. I would probably use virbr0 as a nice virtual network only service, remove forwarding and NAT on it and keep it for internal traffic only.

virbr0 ist just there when libvirtd gets started, it's useless as it is not bridged to anything anymore. However, I'm not able to reproduce my last results consistently. As expected once I rebooted the problem was back and now I can start libvirtd, kill dnsmasq and still get no IP address. I also found a posting on xen-users that describes exactly my problem and solution http://lists.xensource.com/archives/html/xen-users/2007-08/msg00716.html

and the solution is in that direction I suspected. I tried that and again it doesn't work consistently for me. If I do "iptables -A FORWARD -s 0.0.0.0 -d 0.0.0.0 -j ACCEPT" it seemed to work first, but then stopped working as well. I now get an IP when booting up the VM, but it doesn't last long as the reacknowledgement doesn't travel back.

So, bridging and networking is fine except for DHCP, damn.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Ross S. W. Walker

6:26 p.m.

Kai Schaetzl wrote:

...

Ross S. W. Walker wrote on Tue, 1 Apr 2008 11:14:58 -0400:

...
Yeah, I would use xenbr0 (or eth0 in 3.2 parlance) as the bridge if you plan on using an external DHCP server and avoid the whole NAT and dnsmasq mess. I would probably use virbr0 as a nice virtual network only service, remove forwarding and NAT on it and keep it for internal traffic only.

virbr0 ist just there when libvirtd gets started, it's useless as it is not bridged to anything anymore. However, I'm not able to reproduce my last results consistently. As expected once I rebooted the problem was back and now I can start libvirtd, kill dnsmasq and still get no IP address. I also found a posting on xen-users that describes exactly my problem and solution http://lists.xensource.com/archives/html/xen-users/2007-08/msg 00716.html

and the solution is in that direction I suspected. I tried that and again it doesn't work consistently for me. If I do "iptables -A FORWARD -s 0.0.0.0 -d 0.0.0.0 -j ACCEPT" it seemed to work first, but then stopped working as well. I now get an IP when booting up the VM, but it doesn't last long as the reacknowledgement doesn't travel back.

So, bridging and networking is fine except for DHCP, damn.

I also read a posting recently on xen-users where the OP wasn't receiving broadcast arps to the domUs and the solution was to upgrade to the latest network drivers which fixed the problem.

It was a later kernel then 2.6.18 though, so I don't know if it applies, but upstream is always backporting from newer kernels, so who knows. Couldn't hurt (can't believe I said that, cause now IT WILL!).

-Ross

Kai Schaetzl

2 Apr 2 Apr

11:31 a.m.

<E2BB8074E E2BB8074E5500C42984D980D4BD78EF9022A7078@MFG-NYC-EXCH2.mfg.prv Reply-To: centos-virt@centos.org X-Rcpt-To: centos-virt@centos.org

The solution was simple. I had actually already thought about it from the beginning, but somehow lost track and forgot about trying it. I swapped the IP numbers on eth0 and eth0:0 and it started working.

As a reminder: eth0 had a public IP address and eth0:0 holds the private one which is in the same subnet as the IPs handed out by dhcpd. This setup isn't a problem for any packets except for DHCP replies to a bridged virtual network it seems. dhcpd sends out the reply from the public IP address (=(p)eth0) and directs the packet to the private IP address. It never makes it to eth0 for whatever reason. I assume some extra routing or so might be necessary and I must have hit it somehow earlier yesterday, but couldn't reproduce it. Interestingly, the packet (even when it works) doesn't show up in iptables at all. I set logging for all chains and udp packets to these ports and there is nothing. It shows up only in tcpdumping of peth0. One probably needs ebtables to get any hold of these packets.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Richard Chapman

2:03 a.m.

Hi Ross et al

I noticed your comment about disabling dnsmasq below....

I am a relatively new Linux user... I have built Centos 4 and Centos 5 servers over the last year or so - but still have much to learn I'm sure. I am mostly using webmin to manage Centos.

I am running (mostly) the non-xen kernel on a Centos 5 server. I have ISC DHCPd version 3.0.5 running on the Centos 5 box. I saw in the release notes for Centos 5 a comment about possible conflict between dnsmasq and DHCP servers. In my startup scripts - dnsmasq is set to "not start on boot" so I thought there was no problem - but I find that in spite of the startup script - dnsmasq appears to be running.

As far as I can tell - the DHCP server is working as I want it to - so maybe I don't have a problem. It concerns me that dnsmasq appears to be running - and maybe it is sticking its nose in where I don't really want it to. I haven't found a config file for dnsmasq - so I don't know how or where it is configured.

Can anyone tell me how the two DHCP servers will interact? Should I disable dnsmasq - and if so - how do I do this?

Thanks

Richard.

Ross S. W. Walker wrote:

...

Kai Schaetzl wrote:

...
Ross S. W. Walker wrote on Mon, 31 Mar 2008 17:45:24 -0400:

...
It's not all port 67, the DHCP client sends DHCPREQ via UDP port 67 to the broadcast address UDP port 68, the DHCP server responds with a DHCPOFFER from it's IP address UDP port 68 to the clients broadcast address UDP port 67.

Ah, many thanks. Ok, what happens is that the request appears on all interfaces but the reply goes out on peth0 only. And that never reaches the DomU on vifx.0. If I start libvirtd and then kill dnsmasq (as I want dhcpd to answer) the reply propagates further and DomU takes the IP address. There's obviously something in the routing/forwarding that the startup of libvirtd changes. Output of iptables -L suggests it adds a forwarding rule to forward from anywhere to anywhere. But that's not true. This seems to be a limitation of the iptables -L output: it doesn't show the interface (and I don't see a way to change this, if I try to specify an interface I get an error that thi9s is not allowed with -L). Well, I saved iptables and from that it seems that all the forwarding rules apply to virbr0 only. As virbr0 isn't attached to anything anymore these rules should be useless. libvirtd also adds NAT rules, but I don't see how these could affect this either. So, there might be something else needed.

dnsmasq is going to filter out the incoming dhcp requests as it acts as a dhcp server itself. Try disabling dnsmasq, or move your VMs off of virbr0 onto xenbr0.

...
...
BTW I discovered that the tap devices are from qemu running in HVM mode. In HVM qemu does the network emulation and uses the kernel tun device for creating it's network interfaces.

Ah, I see. I'm not running fully virtualized.

...
...
Thanks, I have looked at the patches, but they seem to be for something different. I checked if I can create a new VM with virt-manager and this fails in the network device step. But I think that's yet another bug, we already discussed here, there's also a patch for that.

If you can create a VM with virt-manager, then you don't have Xen 3.2 installed or properly installed...

no, no, no. "I can create a new VM with virt-manager and this fails in the network device step". It cannot get any interfaces. I think there is a patch floating around for this, already mentioned on this list, but it's not the patch (es) you mentioned. Those two patches seem to apply to HVM only, so I shouldn't need them. If I wanted to create new VMs with virt-manager I would need to apply this other patch, though.

Ok...

...
...
I never encountered this error.

I feared that :-(

If you upgraded to Xen 3.2 did you upgrade

...
both the xen-3.2 and xen-libs-3.2 packages? Did you edit your grub config too to load xen-3.2 as well?

Sure. I also installed xen-devel. Ahm, is that "xm new" supposed to do what I think or is it doing something else? I mean I understand that "xm new vmname" should take the VM of that name (identified by the existing config file of that name) and add it to the xenstore, so that I can "manage" from there. Meaning being able to use "start" (there's no stop?) and list it even when not running.

Yes, 'xm new' adds a vm to the store and you can manage it via xm or virsh. There is 'save', 'shutdown', 'destroy', 'suspend' all having to deal with VM running state.

...
...
BTW xmlproc is handled completely in xend I believe, it all works fine on my host and I have no python-xml installed!

Hm, may need to subscribe to the xen list ;-)

I suggest it, there is definitely more traffic there.

-Ross

This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.

CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt

Kai Schaetzl

9:31 a.m.

<E2BB8074E 47F2E96F.5070202@aardvark.com.au Reply-To: centos-virt@centos.org X-Rcpt-To: centos-virt@centos.org

Richard Chapman wrote on Wed, 02 Apr 2008 10:03:27 +0800:

...

In my startup scripts - dnsmasq is set to "not start on boot" so I thought there was no problem - but I find that in spite of the startup script - dnsmasq appears to be running.

What did you do? service dnsmasq status or ps ax|grep dnsmasq ?

dnsmasq would indeed be running, although it's not configured to start, *if* you were using xen, because libvirtd would start it. It's not clear if you are using xen or not.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

Richard Chapman

12:41 p.m.

Thanks for the reply Kai.

I will try to clarify...

I originally built the centos 5 system with xen - but I was having various network problems. I found that if I booted the alternate "non xen" kernel - I had fewer problems. I had no pressing need for xen - so I changed the default boot kernel to be the non-xen kernel - so I assume I am now NOT running xen. However - dnsmasq still appears to be running... Here are some of the commands you suggested...

[root@C5 ~]# service dnsmasq status dnsmasq (pid 3776) is running... [root@C5 ~]# ps ax|grep dnsmasq 3776 ? S 0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --pid-file --conf-file --listen-address 192.168.122.1 --except-interface lo --dhcp-leasefile=/var/lib/libvirt/dhcp-default.leases --dhcp-range 192.168.122.2,192.168.122.254 25474 pts/2 S+ 0:00 grep dnsmasq

This looks to me like some xen stuff is running - in particular the 192.168.122. subnet looks a bit xen-ish.

But I am confident that the "non xen" kernel is running:

[root@C5 ~]# uname -a Linux C5.aardvark.com.au 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

I guess that there must be lots of things I don't understand. It is not clear to me whether I am using xen or not - though I am not intending to...:-)

I notice that libvirtd is running - and is set to run on boot. Should this be the case when running the non xen kernel?

Should I set libvirtd to not start on boot. This might get rid on dnsmasq - but is it a safe and good thing to do in my case? Presumably - if I want to use xen at some future date - I will need to change kernels - and restart libvirtd.

Is there anything else I need to prevent from starting to eliminate all traces of xen?

Thanks

Richard.

...

<E2BB8074E 47F2E96F.5070202@aardvark.com.au Reply-To: centos-virt@centos.org X-Rcpt-To: centos-virt@centos.org

Richard Chapman wrote on Wed, 02 Apr 2008 10:03:27 +0800:

...
In my startup scripts - dnsmasq is set to "not start on boot" so I thought there was no problem - but I find that in spite of the startup script - dnsmasq appears to be running.

What did you do? service dnsmasq status or ps ax|grep dnsmasq ?

dnsmasq would indeed be running, although it's not configured to start, *if* you were using xen, because libvirtd would start it. It's not clear if you are using xen or not.

Kai

Kai Schaetzl

1:31 p.m.

Richard Chapman wrote on Wed, 02 Apr 2008 20:41:33 +0800:

...

I notice that libvirtd is running - and is set to run on boot. Should this be the case when running the non xen kernel?

No. It gets pulled in when you update to CentOS 5.1 and have xen packages installed I assume. You do not need it at all, xen or not. But it provides some functionality if you are running xen, so it depends if you need it or not. But if youa re not running xen you can shut it off.

...

Should I set libvirtd to not start on boot. This might get rid on dnsmasq - but is it a safe and good thing to do in my case?

Yes. "service libvirtd stop" and "chkconfig libvirtd off" will do this. If you ever happen to need it with xen (Ross and I had some conversation about libvirtd here recently, this should give you an idea) you can get it back with the "start" and "on" parameters for the two commands mentioned. For dnsmasq you have to do a "killall dnsmasq" as it wasn't started via the init.d script.

...

Is there anything else I need to prevent from starting to eliminate all traces of xen?

I don't know if xend starts or even can start if you are not running a xen-kernel. So, you could check if xend is running. livirtd in itself is not xen, just a "helper" for various VM API's. And it starts dnsmasq to provide DHCP for the virtual network of the VMs as this seems to be quite complex as you can see in this thread.

So, if you do like above you haven't eliminated "all traces of xen", but there's nothing active related to it and you can switch to it any time later again. I would only uninstall the xen packages if you are absolutely sure that you won't be using xen again.

I find that xen does not interfere with the normal networking at all, the only problem was this DHCP problem for the xen VMs themselves. I'm surprised that using the other kernel solved whatever problems you had.

Kai

-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com

6291

Age (days ago)

6297

Last active (days ago)

virt@lists.centos.org

30 comments

3 participants

tags (0)

participants (3)

Kai Schaetzl
Richard Chapman
Ross S. W. Walker