[CentOS] Keepalive...
John Doe
jdmls at yahoo.com
Wed Feb 25 15:40:52 UTC 2009
I went a bit further...
lvs1# service keepalived stop
lvs2# service keepalived stop
lvs1# service network restart
lvs2# service network restart
Clean start
lvs1# service keepalived start
Feb 25 15:03:18 lvs1 Keepalived: Starting Keepalived v1.1.16 (02/17,2009)
Feb 25 15:03:18 lvs1 Keepalived: Starting Healthcheck child process, pid=9511
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Using MII-BMSR NIC polling thread...
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Netlink reflector reports IP 192.168.28.226 added
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Netlink reflector reports IP 10.0.0.1 added
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Registering Kernel netlink reflector
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Registering Kernel netlink command channel
Feb 25 15:03:18 lvs1 Keepalived: Starting VRRP child process, pid=9512
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Using MII-BMSR NIC polling thread...
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Netlink reflector reports IP 192.168.28.226 added
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Netlink reflector reports IP 10.0.0.1 added
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Registering Kernel netlink reflector
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Registering Kernel netlink command channel
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Registering gratutious ARP shared channel
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Configuration is using : 13235 Bytes
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Activating healtchecker for service [10.0.0.11:80]
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Activating healtchecker for service [10.0.0.12:80]
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Configuration is using : 34062 Bytes
Feb 25 15:03:18 lvs1 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(10,11)]
No VIP and no checks on the web servers...
lvs2# service keepalived start
Feb 25 15:05:23 lvs2 Keepalived: Starting Keepalived v1.1.16 (02/17,2009)
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Using MII-BMSR NIC polling thread...
Feb 25 15:05:23 lvs2 Keepalived: Starting Healthcheck child process, pid=8718
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Using MII-BMSR NIC polling thread...
Feb 25 15:05:23 lvs2 Keepalived: Starting VRRP child process, pid=8719
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Netlink reflector reports IP 192.168.28.227 added
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Netlink reflector reports IP 10.0.0.2 added
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Registering Kernel netlink reflector
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Registering Kernel netlink command channel
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Netlink reflector reports IP 192.168.28.227 added
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Netlink reflector reports IP 10.0.0.2 added
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Registering Kernel netlink reflector
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Registering Kernel netlink command channel
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Registering gratutious ARP shared channel
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Configuration is using : 13233 Bytes
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Activating healtchecker for service [10.0.0.11:80]
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Activating healtchecker for service [10.0.0.12:80]
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Configuration is using : 34060 Bytes
Feb 25 15:05:23 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 25 15:05:23 lvs2 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(10,11)]
No VIP and only one check on the web servers...
lvs1# service keepalived stop
Feb 25 15:07:30 lvs1 Keepalived: Terminating on signal
Feb 25 15:07:30 lvs1 Keepalived: Stopping Keepalived v1.1.16 (02/17,2009)
Feb 25 15:07:30 lvs1 Keepalived_vrrp: Terminating VRRP child process on signal
Feb 25 15:07:30 lvs1 Keepalived_healthcheckers: Terminating Healthchecker child process on signal
And nothing else (lvs2 does not become MASTER)...
lvs1# service keepalived start
Nothing much...
lvs2# service keepalived stop
lvs2# service keepalived start
Nothing and no checks on the web servers...
lvs1# service keepalived stop
lvs1# service keepalived start
Nothing and no checks on the web servers...
lvs1# service keepalived stop
lvs1# service keepalived start
Nothing and only one check on the web servers...
Always stuck on "VRRP sockpool"
By the way, a restart or a stop+restart too fast too often leads to a failed start with "daemon is already running"
lvs1# service keepalived restart
Nothing and no checks on the web servers...
lvs1# service keepalived restart
Nothing and no checks on the web servers...
lvs1# service keepalived restart
Nothing and no checks on the web servers...
lvs1# service keepalived restart
Baam, suddenly many vrrp packets, and one web servers check
Feb 25 15:15:11 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Feb 25 15:15:11 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.16.123
Feb 25 15:15:11 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.16.123
Feb 25 15:15:16 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Feb 25 15:15:16 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.16.123
Feb 25 15:14:50 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 25 15:14:50 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Feb 25 15:14:50 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 25 15:14:55 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 25 15:14:55 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Feb 25 15:14:55 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
The web servers are correctly accessed from outside in rr; but there are still no web checks from the keepalives...
lvs1# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.16.123:http rr
-> 10.0.0.12:http Route 1 0 28
-> 10.0.0.11:http Route 1 0 28
lvs2# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.16.123:http rr
-> 10.0.0.12:http Route 1 0 0
-> 10.0.0.11:http Route 1 0 0
lvs1# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:04:23:9e:f3:74 brd ff:ff:ff:ff:ff:ff
inet 192.168.28.226/20 brd 192.168.31.255 scope global eth0
inet 192.168.16.123/32 scope global eth0
inet6 fe80::204:23ff:fe9e:f374/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:04:23:9e:f3:75 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/8 brd 10.255.255.255 scope global eth1
inet6 fe80::204:23ff:fe9e:f375/64 scope link
valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
No VIP on lvs2 (BACKUP state)
lvs1# service keepalived stop
Feb 25 15:29:06 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
tcpdump => VRRP.MCAST.NET: VRRPv2, Advertisement, vrid 51, prio 0, authtype none, intvl 1s, length 20
No VIP on lvs1 and lvs2, ARP resolution for VIP incomplete...
lvs2# ip a add dev eth0 local 192.168.16.123/32 scope global
Baam, suddenly vrrp packets, and one round (only) of web server checks
15:33:18.639546 IP lvs2.iper > VRRP.MCAST.NET: VRRPv2, Advertisement, vrid 51, prio 99, authtype none, intvl 1s, length 20
15:33:19.641002 IP lvs2.iper > VRRP.MCAST.NET: VRRPv2, Advertisement, vrid 51, prio 99, authtype none, intvl 1s, length 20
lvs1# service keepalived start
Nothing...
lvs2# service keepalived stop
Baam, suddenly vrrp packets, and one round (only) of web server checks
The web servers are correctly accessed from outside in rr...
lvs2# service keepalived start
Nothing, other than Entering BACKUP STATE
Both lvs have the VIP up...
lvs1# service keepalived stop
Same as above, except the VIP is up on lvs2 and down on lvs1, and no webchecks...
The web servers are correctly accessed from outside in rr...
lvs1# service keepalived start
Nothing...
lvs1 "stuck" on VRRP sockpool, while lvs2 is still MASTER
VIP down on lvs1 and up on lvs2
lvs2# service keepalived stop
Baam, suddenly vrrp packets, no web server checks at all
The web servers are correctly accessed from outside in rr...
Both lvs have the VIP up
lvs1# service keepalived stop
lvs1# service keepalived start
lvs2# service keepalived stop
Same as above except that there are webchecks from lvs1 now...
lvs2# service keepalived start
backup state, no webchecks from lvs2
lvs1# service keepalived stop
lvs2 => MASTER
VIP is up on lvs2, down on lvs1
Everything is stuck for like 30s... and then web servers are accessible.
lvs1# service keepalived start
Nothing...
lvs1 "stuck" on VRRP sockpool, while lvs2 is still MASTER
VIP down on lvs1 and up on lvs2
lvs2# service network restart
baam, vrrp packets, lvs1 transition to MASTER and sends ARPs
And I get regular webchecks from both lvs...
And if I bring down one web server, it is correctly removed from the services.
2mns later, no more web checks...
lvs1# service keepalived stop
lvs2 => MASTER
VIP is down on both lvs... ARP is incomplete.
Everything is stuck for ever...
lvs2# ip a add dev eth0 local 192.168.16.123/32 scope global
baam, vrrp packets, lvs1 entering MASTER state and sends ARPs
I caught this: Netlink: error: File exists, type=(20), seq=1235574458, pid=0
Looking for errors in the logs, I found:
Feb 23 16:20:20 lvs1 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:20:20 lvs1 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 16:42:58 lvs1 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:42:58 lvs1 Keepalived_healthcheckers: Netlink: filter function error
Feb 25 12:00:50 lvs1 kernel: IPVS: ip_vs_send_async error
Feb 25 12:12:04 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:04 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:05 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:05 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:05 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:05 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:06 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:06 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:06 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:06 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:07 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:07 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:07 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:07 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:08 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:08 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:08 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:08 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:09 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:09 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:09 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:09 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:10 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:10 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:11 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:11 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:12 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:12 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:13 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:13 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:14 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:14 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:15 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:16 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:12:16 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed: Input/output error
Feb 25 12:12:17 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed: Input/output error
Feb 25 12:33:39 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235561506, pid=0
Feb 25 12:39:11 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235561507, pid=0
Feb 25 12:40:10 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235561508, pid=0
Feb 25 12:40:52 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235561509, pid=0
Feb 23 16:20:16 lvs2 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:20:16 lvs2 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 16:42:46 lvs2 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:42:46 lvs2 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 17:35:36 lvs2 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 17:35:36 lvs2 Keepalived_vrrp: Netlink: filter function error
Feb 25 12:25:22 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235560956, pid=0
Feb 25 12:30:50 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235561435, pid=0
Feb 25 15:33:18 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235570954, pid=0
Feb 25 16:12:02 lvs2 Keepalived_vrrp: Netlink: error: Cannot assign requested address, type=(21), seq=1235574457, pid=0
Feb 25 16:29:11 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20), seq=1235574458, pid=0
Do you have any idea about what could be causing these problems?
Thx,
JD
More information about the CentOS
mailing list