Every time I touch something, pieces fall off! It's a good thing this stuff isn't in production yet (for me I mean).
So I had an LVS, configured with Piranha, directing http test transactions across two servers. I used Piranha to add another realserver. It appeared in the lvs.cf file, but didn't appear in the ipvsadm output. So I stopped and restarted Pulse. And now *none* of the servers appear in the ipvsadm output. Pulse says it started clean, and nothing in the syslog. The gratuitous arp gets made, and the correct IPs are assigned to the correct interfaces.
[ddb@prcapp02 ~]$ sudo ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP prcvmod01.pinerivercapital.l wlc
That's the write service name (the ".l" at the end is ".local" truncated). WLC is the right scheduling mode. But no remote addresses are listed.
In lvs.cf, there are multiple servers present: server vl31 { address = 172.17.3.1 active = 1 weight = 2 } server vw32 { address = 172.17.3.2 active = 1 weight = 2 } server vl41 { address = 172.17.4.1 active = 1 weight = 4 }
Quoting David Dyer-Bennet dd-b@dd-b.net:
Every time I touch something, pieces fall off! It's a good thing this stuff isn't in production yet (for me I mean).
So I had an LVS, configured with Piranha, directing http test transactions across two servers. I used Piranha to add another realserver. It appeared in the lvs.cf file, but didn't appear in the ipvsadm output. So I stopped and restarted Pulse. And now *none* of the servers appear in the ipvsadm output. Pulse says it started clean, and nothing in the syslog. The gratuitous arp gets made, and the correct IPs are assigned to the correct interfaces.
[ddb@prcapp02 ~]$ sudo ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP prcvmod01.pinerivercapital.l wlc
That's the write service name (the ".l" at the end is ".local" truncated). WLC is the right scheduling mode. But no remote addresses are listed.
In lvs.cf, there are multiple servers present: server vl31 { address = 172.17.3.1 active = 1 weight = 2 } server vw32 { address = 172.17.3.2 active = 1 weight = 2 } server vl41 { address = 172.17.4.1 active = 1 weight = 4 }
Is the service itself active?
Do you have a line above these that says something like:
virtual example.com { active = 1
On Thu, September 25, 2008 14:13, Barry Brimer wrote:
Is the service itself active?
Do you have a line above these that says something like:
virtual example.com { active = 1
Yes; and it shows as active in Piranha, too, and nannys got started for the three real servers. It just didn't tell ipvs to actually route to them.
Quoting David Dyer-Bennet dd-b@dd-b.net:
On Thu, September 25, 2008 14:13, Barry Brimer wrote:
Is the service itself active?
Do you have a line above these that says something like:
virtual example.com { active = 1
Yes; and it shows as active in Piranha, too, and nannys got started for the three real servers. It just didn't tell ipvs to actually route to them.
What happens when you run the service check by hand? Do you have your IP addresses for different services on different devices .. i.e. eth0:0 eth0:1 eth0:2?
On Thu, September 25, 2008 14:43, Barry Brimer wrote:
Quoting David Dyer-Bennet dd-b@dd-b.net:
On Thu, September 25, 2008 14:13, Barry Brimer wrote:
Is the service itself active?
Do you have a line above these that says something like:
virtual example.com { active = 1
Yes; and it shows as active in Piranha, too, and nannys got started for the three real servers. It just didn't tell ipvs to actually route to them.
What happens when you run the service check by hand?
Don't know what "service check" means (guessing you mean what nanny does to decide a service is working?). But raising the issue of whether something below the level of what I thought I had changed was changed has been somewhat productive.
While I can ping the realservers, turns out I can't access the services on them. Don't know why yet, but that's something I can investigate. (Still don't see why it changed when it did; but if I can't access the services from the lvs, then it can't route to them either, and the nanny checks will fail, etc., so that must be fixed before anything can work.) I will chase this down, and either fix it or have different questions :-). Thank you!
Do you have your IP addresses for different services on different devices
Yes, they're on separate devices, and they're set up the same was as when it worked yesterday, so I don't think it's anything that basic that's wrong.
I think I've been mis-understanding the startup order. Is this what really happens:
1. pulse started
2. lvsd started by pulse
3. nanny for each (active) realserver started by lvsd
4. When a nanny gets a successful test, either it or lvsd *then* enables that realserver for receiving traffic
That would explain why I have nannys running, but no realservers listed by ipvsadm. I expected things to start out on, and only get turned off if the nannys failed; but in fact doing what I listed above makes more sense, it's better if you *have* a nanny to make sure the nanny reports ok *first*.
Quoting David Dyer-Bennet dd-b@dd-b.net:
On Thu, September 25, 2008 14:43, Barry Brimer wrote:
Quoting David Dyer-Bennet dd-b@dd-b.net:
On Thu, September 25, 2008 14:13, Barry Brimer wrote:
Is the service itself active?
Do you have a line above these that says something like:
virtual example.com { active = 1
Yes; and it shows as active in Piranha, too, and nannys got started for the three real servers. It just didn't tell ipvs to actually route to them.
What happens when you run the service check by hand?
Don't know what "service check" means (guessing you mean what nanny does to decide a service is working?). But raising the issue of whether something below the level of what I thought I had changed was changed has been somewhat productive.
While I can ping the realservers, turns out I can't access the services on them. Don't know why yet, but that's something I can investigate. (Still don't see why it changed when it did; but if I can't access the services from the lvs, then it can't route to them either, and the nanny checks will fail, etc., so that must be fixed before anything can work.) I will chase this down, and either fix it or have different questions :-). Thank you!
Do you have your IP addresses for different services on different devices
Yes, they're on separate devices, and they're set up the same was as when it worked yesterday, so I don't think it's anything that basic that's wrong.
I think I've been mis-understanding the startup order. Is this what really happens:
pulse started
lvsd started by pulse
nanny for each (active) realserver started by lvsd
When a nanny gets a successful test, either it or lvsd *then* enables
that realserver for receiving traffic
That would explain why I have nannys running, but no realservers listed by ipvsadm. I expected things to start out on, and only get turned off if the nannys failed; but in fact doing what I listed above makes more sense, it's better if you *have* a nanny to make sure the nanny reports ok *first*.
By service check, I mean the send or send program line which "expects" the result of the "expect" line to determine that the service is "up".
IME, ipvsadm does not show a host (even at startup) until it is successful from the send/send program / expect tests.
Barry
On Thu, September 25, 2008 16:43, Barry Brimer wrote:
By service check, I mean the send or send program line which "expects" the result of the "expect" line to determine that the service is "up".
Okay, I guessed about right on that.
IME, ipvsadm does not show a host (even at startup) until it is successful from the send/send program / expect tests.
Good design, just not what I immediately guessed.
I have found and fixed the problem with getting to the realservers, and am now happily distributing hundreds of thousands of requests across 4 realservers on two physical hosts, using two different operating systems. We'll call this a success. (Which means I've about completed the first phase of testing; second phase will be with workloads actually related to our intended goal. And after that, implementation!)
Thanks for your help!