oops, or how to bring a datacenter router down with one setting

List overview All Threads
Download

newer

older

MySQL/file system question

Removing All Packages From...

Bob Hoffman

9 Feb 2012 9 Feb '12

11:54 p.m.

so I gave up on bonding. I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) as interfaces. I followed them correctly, or so I thought. I pointed both ethx to the bridge, restarted network and bam...!!!

entire ip block went out.

when I called datacenter they told me the router was under attack and I was like 'uh oh' and told them to just shut off my computer I would be there to fix it. They did not believe me. An hour later I was there and deleted the eth1 point to the br0 and all was fine. Meanwhile they were all around the router trying to stop the attack. (it was just the router for me and others in that room....oops)

I wonder if they will boot me from the center now? How is it possible that it did that so quickly? Such an easy way to bring down routers, wow, a hacker could have a field day.

Apparently there is more to making to eth ports go to the same bridge than a simple point. I have since tried bridge_ports command as listed online, however that must be deprecated. I think I am just gonna stay with multiple bridges with one eth on each for a while until I can test this stuff in a safe environ.

I never had a chance to recover, the second the network came up I lost all contact with my ip block. The ratelimit number got this high by the time I got there.

Feb 9 04:22:41 main kernel: __ratelimit: 100807 callbacks suppressed Feb 9 04:22:41 main kernel: eth1: received packet with own address as source address Feb 9 04:22:41 main kernel: eth1: received packet with own address as source address Feb 9 04:22:41 main kernel: eth1: received packet with own address as source address Feb 9 04:22:41 main kernel: eth1: received packet with own address as source address Feb 9 04:22:41 main kernel: eth0: received packet with own address as source address Feb 9 04:22:41 main kernel: eth0: received packet with own address as source address Feb 9 04:22:41 main kernel: eth0: received packet with own address as source address Feb 9 04:22:41 main kernel: eth0: received packet with own address as source address Feb 9 04:22:41 main kernel: eth0: received packet with own address as source address Feb 9 04:22:41 main kernel: eth0: received packet with own address as source address

Show replies by date

tony＠softins.co.uk

10 Feb 10 Feb

10:18 a.m.

In article 4F345CD3.4060604@bobhoffman.com, Bob Hoffman bob@bobhoffman.com wrote:

...

so I gave up on bonding. I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) as interfaces. I followed them correctly, or so I thought. I pointed both ethx to the bridge, restarted network and bam...!!!

entire ip block went out.

[...]

Feb 9 04:22:41 main kernel: __ratelimit: 100807 callbacks suppressed Feb 9 04:22:41 main kernel: eth1: received packet with own address as source address

I think to do this you also need to be connected to a managed switch which supports interface bonding. You would have to tell it that the two switch ports are bonded to the same machine. That should prevent it from forwarding packets received on one of the ports out via the other port.

The key phrase to look for appears to be "IEEE 802.3ad Dynamic Link Aggregation".

Cheers Tony

-- Tony Mountifield Work: tony@softins.co.uk - http://www.softins.co.uk Play: tony@mountifield.org - http://tony.mountifield.org

Dennis Jacobfeuerborn

1:48 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

On 02/10/2012 11:18 AM, Tony Mountifield wrote:

...

In article4F345CD3.4060604@bobhoffman.com, Bob Hoffmanbob@bobhoffman.com wrote:

...
so I gave up on bonding. I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) as interfaces. I followed them correctly, or so I thought. I pointed both ethx to the bridge, restarted network and bam...!!!

entire ip block went out.

[...]

Feb 9 04:22:41 main kernel: __ratelimit: 100807 callbacks suppressed Feb 9 04:22:41 main kernel: eth1: received packet with own address as source address

I think to do this you also need to be connected to a managed switch which supports interface bonding. You would have to tell it that the two switch ports are bonded to the same machine. That should prevent it from forwarding packets received on one of the ports out via the other port.

The key phrase to look for appears to be "IEEE 802.3ad Dynamic Link Aggregation".

Yes, linux support LACP but it's just one of the possible bonding modes. The other ones can work without special switch support i.e. "Active-backup" only works with one port and the other only comes into play when the first one fails.

Regards, Dennis

Dennis Jacobfeuerborn

11:47 a.m.

New subject: oops, or how to bring a datacenter router down with one setting

On 02/10/2012 12:54 AM, Bob Hoffman wrote:

...

so I gave up on bonding. I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) as interfaces. I followed them correctly, or so I thought. I pointed both ethx to the bridge, restarted network and bam...!!!

Bonding and bridging are completely different things. If you want to start bonding then you should first start with simply bonding the two interfaces and only once you got that going add the bridge and then add the bond0 device to it.

Regards, Dennis

Bob Hoffman

1:54 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

--------------------------------------------------------- Dennis Jacobfeuerborn wrote /Fri Feb 10 06:47:22 EST 2012/

On 02/10/2012 12:54 AM, Bob Hoffman wrote:

...

/ so I gave up on bonding.

/>/ I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) />/ as interfaces. />/ I followed them correctly, or so I thought. />/ I pointed both ethx to the bridge, restarted network and bam...!!! / Bonding and bridging are completely different things. If you want to start bonding then you should first start with simply bonding the two interfaces and only once you got that going add the bridge and then add the bond0 device to it.

Regards, Dennis

-----------------------------------------------------------

Yea, I gave up on bonding, ended up just using eth1. But every tutorial I found had added eth0 and eth1 as interfaces to br0, thus sharing the bridge so to speak. All the tutorials were for debian though, all the centos ones ended up pointing each eth to a different cridge (br0 and br1) So I tried it....bam, took down router in less than a second.

I did not add a domain= setting in the bridge though. With network manager off completely I thought I would not need too. Looking at the resolv.conf it was overwritten anyway and since no domain was listed, it said "search belkin" search belkin

I assume that was the datacenters router....

I was not bonding at this time. I am wondering though why the network manager overwrites resolv.conf if NM is off, all ifcfg files say nm_controlled=no, and chkconfig NetworkManager off was run.

It is not that way on my 5.x, but I guess things change. I wonder if that was messing my bond experiment up too without me knowing it.

Janez Kosmrlj

2:02 p.m.

i have several centos 5.x servers with bonding enabled. And none of them have any problems.

I used this tutorial: http://www.howtoforge.com/network_card_bonding_centos

I use mode=6.

On Fri, Feb 10, 2012 at 2:54 PM, Bob Hoffman bob@bobhoffman.com wrote:

...

Dennis Jacobfeuerborn wrote /Fri Feb 10 06:47:22 EST 2012/

On 02/10/2012 12:54 AM, Bob Hoffman wrote:

...
/ so I gave up on bonding.

/>/ I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) />/ as interfaces. />/ I followed them correctly, or so I thought. />/ I pointed both ethx to the bridge, restarted network and bam...!!! / Bonding and bridging are completely different things. If you want to start bonding then you should first start with simply bonding the two interfaces and only once you got that going add the bridge and then add the bond0 device to it.

Regards, Dennis

Yea, I gave up on bonding, ended up just using eth1. But every tutorial I found had added eth0 and eth1 as interfaces to br0, thus sharing the bridge so to speak. All the tutorials were for debian though, all the centos ones ended up pointing each eth to a different cridge (br0 and br1) So I tried it....bam, took down router in less than a second.

I did not add a domain= setting in the bridge though. With network manager off completely I thought I would not need too. Looking at the resolv.conf it was overwritten anyway and since no domain was listed, it said "search belkin" search belkin

I assume that was the datacenters router....

I was not bonding at this time. I am wondering though why the network manager overwrites resolv.conf if NM is off, all ifcfg files say nm_controlled=no, and chkconfig NetworkManager off was run.

It is not that way on my 5.x, but I guess things change. I wonder if that was messing my bond experiment up too without me knowing it. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

m.roth＠5-cent.us

2:18 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

Bob Hoffman wrote:

...

Dennis Jacobfeuerborn wrote /Fri Feb 10 06:47:22 EST 2012/

On 02/10/2012 12:54 AM, Bob Hoffman wrote:

...
/ so I gave up on bonding.

/>/ I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) />/ as interfaces. />/ I followed them correctly, or so I thought. />/ I pointed both ethx to the bridge, restarted network and bam...!!!

<snip>

...

I was not bonding at this time. I am wondering though why the network manager overwrites resolv.conf if NM is off, all ifcfg files say nm_controlled=no, and chkconfig NetworkManager off was run.

dhcp running? That will update resolv.conf; NM not needed.

mark

Dennis Jacobfeuerborn

3:06 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

On 02/10/2012 02:54 PM, Bob Hoffman wrote:

...

Dennis Jacobfeuerborn wrote /Fri Feb 10 06:47:22 EST 2012/

On 02/10/2012 12:54 AM, Bob Hoffman wrote:

...
/ so I gave up on bonding.

/>/ I found about 300 posts showing eth0 and eth1 both pointing to br0 (bridge) />/ as interfaces. />/ I followed them correctly, or so I thought. />/ I pointed both ethx to the bridge, restarted network and bam...!!! / Bonding and bridging are completely different things. If you want to start bonding then you should first start with simply bonding the two interfaces and only once you got that going add the bridge and then add the bond0 device to it.

Regards, Dennis

Yea, I gave up on bonding, ended up just using eth1. But every tutorial I found had added eth0 and eth1 as interfaces to br0, thus sharing the bridge so to speak. All the tutorials were for debian though, all the centos ones ended up pointing each eth to a different cridge (br0 and br1)

What are you actually trying to accomplish? You still seem to mix bonding and bridging willy nilly as if they are somehow related. They are not.

Regards, Dennis

Bob Hoffman

3:25 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

/ ================================= Dennis Jacobfeuerborn wrote

/>/ Yea, I gave up on bonding, ended up just using eth1. But every tutorial />/ I found had added eth0 and eth1 as interfaces to br0, thus sharing the />/ bridge so to speak. />/ All the tutorials were for debian though, all the centos ones ended up />/ pointing each eth to a different cridge (br0 and br1) / What are you actually trying to accomplish? You still seem to mix bonding and bridging willy nilly as if they are somehow related. They are not.

Regards, Dennis ==================================

Nothing at all to do with bonding. Not at all. eth1 to br0 , eth0 to br0....that's all. If that is possible, I see no reason for a bond at all. I just want to make sure if an NIC fails, the other one is still working while I am asleep and not a care in the world.

Dennis Jacobfeuerborn

3:41 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

On 02/10/2012 04:25 PM, Bob Hoffman wrote:

...

/

Dennis Jacobfeuerborn wrote

/>/ Yea, I gave up on bonding, ended up just using eth1. But every tutorial />/ I found had added eth0 and eth1 as interfaces to br0, thus sharing the />/ bridge so to speak. />/ All the tutorials were for debian though, all the centos ones ended up />/ pointing each eth to a different cridge (br0 and br1) / What are you actually trying to accomplish? You still seem to mix bonding and bridging willy nilly as if they are somehow related. They are not.

Regards, Dennis ==================================

Nothing at all to do with bonding. Not at all. eth1 to br0 , eth0 to br0....that's all. If that is possible, I see no reason for a bond at all. I just want to make sure if an NIC fails, the other one is still working while I am asleep and not a care in the world.

Bridging doesn't do that. You need bonding for this.

Regards, Dennis

Bob Hoffman

3:53 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

/ ================================= />/ Dennis Jacobfeuerborn wrote/>/ />/ Nothing at all to do with bonding. Not at all. />/ eth1 to br0 , eth0 to br0....that's all. />/ If that is possible, I see no reason for a bond at all. />/ I just want to make sure if an NIC fails, the other one is still working />/ while I am asleep and not a care in the world. / Bridging doesn't do that. You need bonding for this.

Regards, Dennis

==================================== That may be true, I am no expert at all, but I can find you literally hundreds of how-tos out there all specifically adding two or more ethx interfaces to the same bridge. hundreds. So, I thought it would be safe to do. But obviously it is dangerous or I messed up real well..lol

https://www.google.com/search?q=brctl+eth0+eth1+br0&btnG=Search&oe=u...

google search with a lot of the how-tos i was following.

Devin Reade

4:22 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

Bob,

I'd suggest you do some more reading on the purpose behind bonding and bridging. It *sounds* like what you functionally need is to have a server with a single route upstream, not acting as a gateway, but where you want to be able to take a failure on one of the upstream network connections without losing connectivity.

If that is true, then look at bonding.

Bridging is typically used if you want to have a machine, perhaps acting as a transparent firewall join two physical network segments as if they are one logical network. It has nothing to do with network redundancy.

Note that bonding will only solve the redundancy problem if your upstream switches are redundant and all the upstream connections from there are redundant as well. (Bonding can have other purposes as well, such as increasing throughput, but I don't think that's relevent here.)

As an aside (and in case you run into it in your reading), multihoming is another way to receive redundancy, but unless you are an expert (or at least very experienced) in networking including DNS, routing, and exterior gateway protocols, as well as having your own ASN and directly assigned network blocks, then Don't Go There. And this type of multihoming is typically used only on border gateways. (Also, if you do multihoming wrong and start flapping then your peer networks will typically blacklist you and you lose *all* connectivity.)

Devin

Les Mikesell

7:49 p.m.

On Fri, Feb 10, 2012 at 9:25 AM, Bob Hoffman bob@bobhoffman.com wrote:

...

Nothing at all to do with bonding. Not at all. eth1 to br0 , eth0 to br0....that's all. If that is possible, I see no reason for a bond at all. I just want to make sure if an NIC fails, the other one is still working while I am asleep and not a care in the world.

I suppose it is possible for a NIC to fail, but I can't recall actually ever seeing it. I've seen lots of complicated failover schemes introduce new problems and their own failure modes though, including a bad cable that kept flipping the primary/backup links at approximately the same rate that spanning-tree would let them switch.

-- Les Mikesell lesmikesell@gmail.com

Devin Reade

9:33 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

--On Friday, February 10, 2012 01:49:05 PM -0600 Les Mikesell lesmikesell@gmail.com wrote:

...

I suppose it is possible for a NIC to fail, but I can't recall actually ever seeing it. I've seen lots of complicated failover schemes introduce new problems and their own failure modes [...]

+1.

Redundancy is cool. Redundancy, when needed and properly implemented, can work and can save your bacon. However, it is expensive, time consuming, and significantly increases both the complexity of a system and the skill needed to analyze problems (or for that matter predict them and plan for mitigation strategies). It also needs to be exercised on a regular basis or, when you need it, you'll find that someone has made a bad configuration change that prohibits failover.

I, also, have not seen a properly tested NIC fail in quite a few years. (I'm discounting bad NIC models that don't pass evaluation.) Of course, just because I've not seen it doesn't mean it can't happen, but I also don't usually worry about having a redundant SERIAL back-channel for cluster hearbeat operations, which used to be considered as the only reasonable way to do things.

I do have clusters where bonding is in use but those have helped not so much in avoiding NIC failures as they do in allowing the machines to continue operating as the network team brings down part of the redundant switch network for maintenance (or to replace a failed switch, or when some fool decides that they can unplug a network cable briefly so that they can move other cables around).

Devin

m.roth＠5-cent.us

9:40 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

Devin Reade wrote: <snip>

...

I do have clusters where bonding is in use but those have helped not so much in avoiding NIC failures as they do in allowing the machines to continue operating as the network team brings down part of the redundant switch network for maintenance (or to replace a failed switch, or when some fool decides that they can unplug a network cable briefly so that they can move other cables around).

Now wait a minute - I would dearly love to disconnect some cables we have in a shared rack downstairs in the datacenter - it's a rats' nest, and more than half ain't ours, and every single time I have to do something in the back, I'm deathly afraid I'm going to pull out somebody's power, or....

mark

Devin Reade

9:49 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

--On Friday, February 10, 2012 04:40:59 PM -0500 m.roth@5-cent.us wrote:

...

Devin Reade wrote:

<snip> > or when some fool decides that they can unplug a network cable > briefly so that they can move other cables around). > Now wait a minute - I would dearly love to disconnect some cables we have in a shared rack downstairs in the datacenter [...]

My complaint is not with moving cables, it's in doing so without having proper change control.

Clean data centers == good Arbitrarily moving hardware without planning and authorization == bad

Devin

m.roth＠5-cent.us

10:02 p.m.

New subject: oops, or how to bring a datacenter router down with one setting

Devin Reade wrote:

...

--On Friday, February 10, 2012 04:40:59 PM -0500 m.roth@5-cent.us wrote:

...
Devin Reade wrote:

<snip> > or when some fool decides that they can unplug a network cable > briefly so that they can move other cables around). > Now wait a minute - I would dearly love to disconnect some cables we have in a shared rack downstairs in the datacenter [...]

My complaint is not with moving cables, it's in doing so without having proper change control.

Change control? We're talking about a datacenter that provides racks, power, and connectivity, and the responsible folks from various Institutes get to rack, connect, etc them all....

...

Clean data centers == good Arbitrarily moving hardware without planning and authorization == bad

The racks are locked, so no one who doesn't have access to the rack can do anything, but this shared rack!

mark

Les Mikesell

9:58 p.m.

On Fri, Feb 10, 2012 at 3:40 PM, m.roth@5-cent.us wrote:

...

Devin Reade wrote:

<snip> > I do have clusters where bonding is in use but those have helped not so > much in avoiding NIC failures as they do in allowing the machines > to continue operating as the network team brings down part of the > redundant switch network for maintenance (or to replace a failed switch, > or when some fool decides that they can unplug a network cable > briefly so that they can move other cables around). > Now wait a minute - I would dearly love to disconnect some cables we have in a shared rack downstairs in the datacenter - it's a rats' nest, and more than half ain't ours, and every single time I have to do something in the back, I'm deathly afraid I'm going to pull out somebody's power, or....

Do you really want to double the size of the mess to make it a little safer to move one thing? Redundant power connections normally do work with only a little attention to grounding and that the connections really do go to separate circuits/UPSs. But with NICs, you have to be very careful that the switch ports are configured to match so you are even more likely to break things by moving them around. It's not impossible, but rarely worthwhile if you don't need the combined bandwidth. But the real lesson here is to not do something for the first time in a place where mistakes will cause big trouble.

-- Les Mikesell lesmikesell@gmail.com

Gordon Messmer

14 Feb 14 Feb

12:11 a.m.

New subject: oops, or how to bring a datacenter router down with one setting

On 02/10/2012 05:54 AM, Bob Hoffman wrote:

...

Yea, I gave up on bonding, ended up just using eth1. But every tutorial I found had added eth0 and eth1 as interfaces to br0, thus sharing the bridge so to speak.

Those tutorials were documenting the manner in which you can set up a transparent Linux firewall. That's not what you want to do with a KVM server.

Creating an Ethernet bridge and adding two interfaces to it effectively makes a Linux host into a two-port switch with firewalling.

If you connect multiple ports from one switch to ports on a second switch (two bridged Linux Ethernet ports to a switch) you create a switch loop. Switch loops will endlessly replay broadcast traffic (such as ARP), creating a broadcast storm.

Yes, that can consume all of a router's CPU cycles quite easily. That is why data centers should always run spanning tree on their switches. STP will shut off ports that get looped.

Lamar Owen

10 Feb 10 Feb

8:01 p.m.

On Feb 9, 2012, at 6:54 PM, Bob Hoffman wrote:

...

entire ip block went out.

when I called datacenter they told me the router was under attack and I was like 'uh oh' and told them to just shut off my computer I would be there to fix it. They did not believe me. An hour later I was there and deleted the eth1 point to the br0 and all was fine. Meanwhile they were all around the router trying to stop the attack. (it was just the router for me and others in that room....oops)

I wonder if they will boot me from the center now? How is it possible that it did that so quickly? Such an easy way to bring down routers, wow, a hacker could have a field day.

If you weren't running a spanning-tree on your Linux bridge, and their switch ports aren't sending you BPDU's for STP, then you found out what happens when you activate a bridging (from the point of view of the switch, not the Linux bridging) loop. Been there, done that. Most monitoring tools are written to track layer-3 happenings, and this is happening at layer 2. And it will take down that whole layer 2 broadcast domain, that's for sure.

And since many, if not most, tools are working at layer 3 and dealing with IP flows and not actual ethernet traffic, none of the typical layer 3 tools will give any indication why the network just bogged down to a halt; you just about have to have a network probe (like wireshark) on a SPAN port to catch it, unless you know some of the telltale signs. On a gigabit switch a fully saturating bridge loop can form in less than a second, and bring things close to a halt.

Most datacenter switches have configurable parameters to guard against loops (Cisco even has a feature called, appropriately enough, loopguard, but this may or may not fix this case).

4915

Age (days ago)

4920

Last active (days ago)

discuss@lists.centos.org

19 comments

9 participants

tags (0)

participants (9)

Bob Hoffman
Dennis Jacobfeuerborn
Devin Reade
Gordon Messmer
Janez Kosmrlj
Lamar Owen
Les Mikesell
m.roth＠5-cent.us
tony＠softins.co.uk