NetworkManager fights with DHCP-only backup NIC

List overview All Threads
Download

newer

older

Re: [CentOS] NetworkManager fights...

CentOs 7.0 and reboot failure

Warren Young

1 Dec 2014 1 Dec '14

9:35 p.m.

We ship servers to remote sites, which are rarely staffed with techs familiar with Linux. We have them tell us the static IP configuration for the box before we ship it, then we set it up for them here and ship it out to the site, where they just plug it in, turn it on, and walk away.

That’s the ideal, anyway.

What often happens in reality is either:

1. They give us incorrect static IP info, so the box arrives and won’t connect to the Internet, which means we often have to arrange to get someone clueful on-site to fix it.

2. The site is in the middle of some major deployment, a small piece of which is our server, so the LAN isn’t ready, but they demand the box be shipped early anyway for some handwavy business reason. "No, we can’t tell you what static IP to use," they say. "Just configure it on-site," they say. Sigh.

Since these systems have 2+ Ethernet ports and we really only need one in normal operations, we’ve taken to configuring the second one for DHCP, so that they can just move the cable from the primary port to the secondary.

This works fine in CentOS 5: DHCP comes up and takes over, giving us the access we need to fix/configure the static IP on the primary port.

What happens in CentOS 7 depends on whether you plug in one cable or two:

1. If you plug in only one cable, NetworkManager sees that the static interface is unplugged, so it *helpfully* moves that IP to the secondary NIC, apparently on the assumption that static is always better than DHCP. This is of no use to us, since all it does is move the problem to the other NIC.

2. If you plug both cables in, both interfaces come up configured as you’d expect, but since both configurations provided a gateway address, you still can’t get out to the Internet since the static one came up first, and it’s pointing at an unreachable box.

I think all we need to do to fix this is convince NetworkManager not to be clever about moving the static IP to the second NIC. Alas, there is no checkbox in the NM GUI labeled “This is a 4U server, dummy, not a laptop.”

Anyone know how to convince NM to obey the MAC binding in the ifcfg-* file, to prevent NM from moving the broken static IP info to the second NIC?

Yes, we know we can still disable NetworkManager and edit network-scripts/ifcfg-* directly. We’d just prefer not to fight the OS. Also, unlike EL6, disabling NM on EL7 breaks the network GUI, which we’ve occasionally found helpful, as when we have a semi-clueful tech at the remote site.

Show replies by date

Les Mikesell

1 Dec 1 Dec

9:48 p.m.

On Mon, Dec 1, 2014 at 3:35 PM, Warren Young wyml@etr-usa.com wrote:

...

We ship servers to remote sites, which are rarely staffed with techs familiar with Linux. We have them tell us the static IP configuration for the box before we ship it, then we set it up for them here and ship it out to the site, where they just plug it in, turn it on, and walk away.

That’s the ideal, anyway.

Is there anyone who has more than a few boxes at more than one location who _doesn't_ have this issue? I'd like to see a FAQ or something by whoever designed the network configuration system about how they planned for it to work (with and without GUI availability). Likewise for what is supposed to happen when you restore a backup onto different hardware.

-- Les Mikesell lesmikesell@gmail.com

Nathan Duehr

2 Dec 2 Dec

12:56 a.m.

...

On Dec 1, 2014, at 14:48, Les Mikesell lesmikesell@gmail.com wrote:

On Mon, Dec 1, 2014 at 3:35 PM, Warren Young wyml@etr-usa.com wrote:

...
We ship servers to remote sites, which are rarely staffed with techs familiar with Linux. We have them tell us the static IP configuration for the box before we ship it, then we set it up for them here and ship it out to the site, where they just plug it in, turn it on, and walk away.

That’s the ideal, anyway.

Is there anyone who has more than a few boxes at more than one location who _doesn't_ have this issue? I'd like to see a FAQ or something by whoever designed the network configuration system about how they planned for it to work (with and without GUI availability). Likewise for what is supposed to happen when you restore a backup onto different hardware.

Most of the time, I end up nuking HWADDR from orbit on most boxes. It just causes more trouble than it fixes.

-- Nate Duehr denverpilot@me.com

Les Mikesell

2:26 a.m.

On Mon, Dec 1, 2014 at 6:56 PM, Nathan Duehr denverpilot@me.com wrote:

...

...
...
We ship servers to remote sites, which are rarely staffed with techs familiar with Linux. We have them tell us the static IP configuration for the box before we ship it, then we set it up for them here and ship it out to the site, where they just plug it in, turn it on, and walk away.

That’s the ideal, anyway.

Is there anyone who has more than a few boxes at more than one location who _doesn't_ have this issue? I'd like to see a FAQ or something by whoever designed the network configuration system about how they planned for it to work (with and without GUI availability). Likewise for what is supposed to happen when you restore a backup onto different hardware.

Most of the time, I end up nuking HWADDR from orbit on most boxes. It just causes more trouble than it fixes.

Sure, but the interface names will be different in the 'restore backup case' - especially on servers that have several.

-- Les Mikesell lesmikesell@gmail.com

Rob Kampen

5:27 a.m.

On 12/02/2014 10:35 AM, Warren Young wrote:

...

We ship servers to remote sites, which are rarely staffed with techs familiar with Linux. We have them tell us the static IP configuration for the box before we ship it, then we set it up for them here and ship it out to the site, where they just plug it in, turn it on, and walk away.

That’s the ideal, anyway.

What often happens in reality is either:

They give us incorrect static IP info, so the box arrives and won’t connect to the Internet, which means we often have to arrange to get someone clueful on-site to fix it.

The site is in the middle of some major deployment, a small piece of which is our server, so the LAN isn’t ready, but they demand the box be shipped early anyway for some handwavy business reason. "No, we can’t tell you what static IP to use," they say. "Just configure it on-site," they say. Sigh.

Since these systems have 2+ Ethernet ports and we really only need one in normal operations, we’ve taken to configuring the second one for DHCP, so that they can just move the cable from the primary port to the secondary.

This works fine in CentOS 5: DHCP comes up and takes over, giving us the access we need to fix/configure the static IP on the primary port.

What happens in CentOS 7 depends on whether you plug in one cable or two:

If you plug in only one cable, NetworkManager sees that the static interface is unplugged, so it *helpfully* moves that IP to the secondary NIC, apparently on the assumption that static is always better than DHCP. This is of no use to us, since all it does is move the problem to the other NIC.

If you plug both cables in, both interfaces come up configured as you’d expect, but since both configurations provided a gateway address, you still can’t get out to the Internet since the static one came up first, and it’s pointing at an unreachable box.

I think all we need to do to fix this is convince NetworkManager not to be clever about moving the static IP to the second NIC. Alas, there is no checkbox in the NM GUI labeled “This is a 4U server, dummy, not a laptop.”

Anyone know how to convince NM to obey the MAC binding in the ifcfg-* file, to prevent NM from moving the broken static IP info to the second NIC?

Have you put NM_CONTROLLED="no" in the ifcfg-eth0 script?

...

Yes, we know we can still disable NetworkManager and edit network-scripts/ifcfg-* directly. We’d just prefer not to fight the OS. Also, unlike EL6, disabling NM on EL7 breaks the network GUI, which we’ve occasionally found helpful, as when we have a semi-clueful tech at the remote site. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Warren Young

8:26 p.m.

On Dec 1, 2014, at 10:27 PM, Rob Kampen rkampen@reaching-clients.com wrote:

...

Have you put NM_CONTROLLED="no" in the ifcfg-eth0 script?

How is that better than

systemctl stop NetworkManager systemctl disable NetworkManager

Again, I’m not really after a way to make this work without NetworkManager. We’ve already got that. What I want is a way to tell NM to obey the MAC binding. This configuration *here* goes with that MAC chip *there*.

Given that, we don’t need to disable NetworkManager.

Les Mikesell

8:36 p.m.

On Tue, Dec 2, 2014 at 2:26 PM, Warren Young wyml@etr-usa.com wrote:

...

On Dec 1, 2014, at 10:27 PM, Rob Kampen rkampen@reaching-clients.com wrote:

...
Have you put NM_CONTROLLED="no" in the ifcfg-eth0 script?

How is that better than
systemctl stop NetworkManager
systemctl disable NetworkManager
Again, I’m not really after a way to make this work without NetworkManager. We’ve already got that. What I want is a way to tell NM to obey the MAC binding. This configuration *here* goes with that MAC chip *there*.

Given that, we don’t need to disable NetworkManager.

What part of the breakage that NetworkManager does is good for a wired, static-addressed server? But, in your scenario where both nics are plugged in and your only problem is the non-working gateway IP you should be able to ssh to some other box on the working network, then over to the new ones DHCP address. The gateway won't matter if both ends are on the same subnet.

-- Les Mikesell lesmikesell@gmail.com

Warren Young

9:14 p.m.

On Dec 2, 2014, at 1:36 PM, Les Mikesell lesmikesell@gmail.com wrote:

...

On Tue, Dec 2, 2014 at 2:26 PM, Warren Young wyml@etr-usa.com wrote:

...
Again, I’m not really after a way to make this work without NetworkManager.

What part of the breakage that NetworkManager does is good for a wired, static-addressed server?

If you disable NM, the network configuration GUI stops working in EL7. (I didn’t do much with EL6, but I thought its GUI had a fall-back for the non-NM case.)

We don’t need this GUI, but our semi-technical customers sometimes do. It can be the difference between rolling a truck to a remote site vs letting the on-site people take care of the problem.

...

you should be able to ssh to some other box on the working network,

I did mention that these sites rarely have local staff who know Linux. You can correctly infer from that there *are* no other SSH servers, just ours.

These are K-12 schools, for the most part. They often don’t have technical staff on-site at all. We have to schedule time with overworked district-level staff who often only know Windows to get anything at this level done.

We’ve built up nasty hacks to solve this before; VPN -> RDP -> PuTTY -> Linux server, for instance. Getting protective network admins to allow all this can chew up weeks of time.

It’s far, far better if the Linux box just phones home with the info we need to fix it. It can cut a 4-week phone tag game down to 15 minutes.

Les Mikesell

9:28 p.m.

On Tue, Dec 2, 2014 at 3:14 PM, Warren Young wyml@etr-usa.com wrote:

...

...
What part of the breakage that NetworkManager does is good for a wired, static-addressed server?

If you disable NM, the network configuration GUI stops working in EL7. (I didn’t do much with EL6, but I thought its GUI had a fall-back for the non-NM case.)

We don’t need this GUI, but our semi-technical customers sometimes do. It can be the difference between rolling a truck to a remote site vs letting the on-site people take care of the problem.

But can't you still set NM_CONTROLLED=no on an interface?

...

...
you should be able to ssh to some other box on the working network,

I did mention that these sites rarely have local staff who know Linux. You can correctly infer from that there *are* no other SSH servers, just ours.

These are K-12 schools, for the most part. They often don’t have technical staff on-site at all. We have to schedule time with overworked district-level staff who often only know Windows to get anything at this level done.

...

We’ve built up nasty hacks to solve this before; VPN -> RDP -> PuTTY -> Linux server, for instance. Getting protective network admins to allow all this can chew up weeks of time.

I'm way too familiar with the problem - but we usually have several boxes in one place.

...

It’s far, far better if the Linux box just phones home with the info we need to fix it. It can cut a 4-week phone tag game down to 15 minutes.

I've done some weird stuff like scripts that bring up all the interfaces, look for link, apply one of the IPs that the box should have to one of the interfaces with link up, try to ping the gateway, lather, rinse, repeat, but I've never been happy with any of it. Maybe a USB wifi adapter could be set up to make an openvpn connection back to a home server if you know the location has wifi. That could give you a known private IP to connect to for the rest of the configuration.

-- Les Mikesell lesmikesell@gmail.com

Warren Young

10:11 p.m.

On Dec 2, 2014, at 2:28 PM, Les Mikesell lesmikesell@gmail.com wrote:

...

On Tue, Dec 2, 2014 at 3:14 PM, Warren Young wyml@etr-usa.com wrote:

...
...
What part of the breakage that NetworkManager does is good for a wired, static-addressed server?

If you disable NM, the network configuration GUI stops working in EL7.

But can't you still set NM_CONTROLLED=no on an interface?

That still effectively breaks the network settings GUI. Interfaces you mark that way show as “unmanaged” in the GUI, and you can’t modify any of their settings. You can’t change them back to “managed” via the GUI. You can’t even add an IP alias to them via the GUI.

If you’re suggesting that I do this only to the static interface and leave the DHCP one under NM’s control, the only improvement relative to disabling NM entirely is that it at least gives the semi-technical people on site the option of repurposing the DHCP interface as a secondary static interface.

That’s not useless, but it’s a far cry from the MAC bonding I’m after.

4043

Age (days ago)

4044

Last active (days ago)

discuss@lists.centos.org

9 comments

4 participants

tags (0)

participants (4)

Les Mikesell
Nathan Duehr
Rob Kampen
Warren Young