OpenSwan Drop Out Issue

List overview All Threads
Download

newer

older

ananconda update.img creation...

CentOS 7 workstation, mutter...

John Cenile

9 Feb 2016 9 Feb '16

3:04 p.m.

Hello,

I'm cross posting this from the OpenSwan mailing list, in case someone here can help.

We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 /24 subnets each (so 12 in total).

The problem we're having is completely randomly, be it in the middle of the day, or in the middle of the night (so I don't believe it's traffic related), certain (and sometimes all) routes will drop. They usually recover after a few minutes, but it's still long enough for our monitoring to detect downtime.

The configuration we have on each device is:

conn site-a keyingtries=0 keylife=1h ikelifetime=8h left=1.1.1.1 right=2.2.2.2

leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}

rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} pfs=yes auto=start authby=secret dpddelay=30 dpdtimeout=120 dpdaction=hold phase2alg=aes256-sha1;modp1536 phase2=esp ike=aes256-sha1;modp1536

It's mirrored exactly the same on the other side.

I have tried changing the dead peer detection timeout to something high (5 minutes), and removing it completely (which I believe defaults it to 30 seconds), neither of which made any difference.

I can't see any very obvious errors in the logs, however the most recent drop out produced the following message around the same time:

Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of retransmissions (2) reached STATE_QUICK_I1 Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying attempt 2 of an unlimited number Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 pfsgroup=OAKLEY_GROUP_MODP1536}

and also

Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message

Before we move to another solution, does anyone have any suggestions on what the problem might be? Running a constant ping between the two hosts doesn't drop *any* packets (even when the IPSec connection itself drops out).

Thanks in advance.

Show replies by date

Eero Volotinen

9 Feb 9 Feb

3:14 p.m.

Try setting lower keyexpiry time on other endpoint.

-- Eero

2016-02-09 17:04 GMT+02:00 John Cenile jcenile1983@gmail.com:

...

Hello,

I'm cross posting this from the OpenSwan mailing list, in case someone here can help.

We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 /24 subnets each (so 12 in total).

The problem we're having is completely randomly, be it in the middle of the day, or in the middle of the night (so I don't believe it's traffic related), certain (and sometimes all) routes will drop. They usually recover after a few minutes, but it's still long enough for our monitoring to detect downtime.

The configuration we have on each device is:

conn site-a keyingtries=0 keylife=1h ikelifetime=8h left=1.1.1.1 right=2.2.2.2

leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}

rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} pfs=yes auto=start authby=secret dpddelay=30 dpdtimeout=120 dpdaction=hold phase2alg=aes256-sha1;modp1536 phase2=esp ike=aes256-sha1;modp1536

It's mirrored exactly the same on the other side.

I have tried changing the dead peer detection timeout to something high (5 minutes), and removing it completely (which I believe defaults it to 30 seconds), neither of which made any difference.

I can't see any very obvious errors in the logs, however the most recent drop out produced the following message around the same time:

Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of retransmissions (2) reached STATE_QUICK_I1 Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying attempt 2 of an unlimited number Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 pfsgroup=OAKLEY_GROUP_MODP1536}

and also

Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message

Before we move to another solution, does anyone have any suggestions on what the problem might be? Running a constant ping between the two hosts doesn't drop *any* packets (even when the IPSec connection itself drops out).

Thanks in advance. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

John Cenile

3:19 p.m.

Thanks, I've updated the config with the following:

keylife=20m ikelifetime=2h

I'll see how that goes.

In the mean time, any other suggestions would be greatly appreciated.

On 10 February 2016 at 02:14, Eero Volotinen eero.volotinen@iki.fi wrote:

...

Try setting lower keyexpiry time on other endpoint.

-- Eero

2016-02-09 17:04 GMT+02:00 John Cenile jcenile1983@gmail.com:

...
Hello,

I'm cross posting this from the OpenSwan mailing list, in case someone here can help.

We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 /24 subnets each (so 12 in total).

The problem we're having is completely randomly, be it in the middle of the day, or in the middle of the night (so I don't believe it's traffic related), certain (and sometimes all) routes will drop. They usually recover after a few minutes, but it's still long enough for our monitoring to detect downtime.

The configuration we have on each device is:

conn site-a keyingtries=0 keylife=1h ikelifetime=8h left=1.1.1.1 right=2.2.2.2

leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}

rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} pfs=yes auto=start authby=secret dpddelay=30 dpdtimeout=120 dpdaction=hold phase2alg=aes256-sha1;modp1536 phase2=esp ike=aes256-sha1;modp1536

It's mirrored exactly the same on the other side.

I have tried changing the dead peer detection timeout to something high (5 minutes), and removing it completely (which I believe defaults it to 30 seconds), neither of which made any difference.

I can't see any very obvious errors in the logs, however the most recent drop out produced the following message around the same time:

Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of retransmissions (2) reached STATE_QUICK_I1 Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying attempt 2 of an unlimited number Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 pfsgroup=OAKLEY_GROUP_MODP1536}

and also

Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message

Before we move to another solution, does anyone have any suggestions on what the problem might be? Running a constant ping between the two hosts doesn't drop *any* packets (even when the IPSec connection itself drops out).

Thanks in advance. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Gordon Messmer

5:52 p.m.

On 02/09/2016 07:04 AM, John Cenile wrote:

...

does anyone have any suggestions on what the problem might be?

Not off the top of my head, but if I were you, I'd enable debugging of "control" and "dpd". See man ipsec.conf (/plutodebug) and man ipsec_pluto.

Eero Volotinen

5:58 p.m.

Centos 5 is also a bit old os. Is it possible to use newer version? (like centos 7 or centos 6?)

Eero

2016-02-09 19:52 GMT+02:00 Gordon Messmer gordon.messmer@gmail.com:

...

On 02/09/2016 07:04 AM, John Cenile wrote:

...
does anyone have any suggestions on what the problem might be?

Not off the top of my head, but if I were you, I'd enable debugging of "control" and "dpd". See man ipsec.conf (/plutodebug) and man ipsec_pluto.

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

John Cenile

10 Feb 10 Feb

6:34 a.m.

So lowering the keylife / ikelifetime didn't solve the problem. I've enabled debugging and I'll see what it says.

Unfortunately we can't (easily) upgrade CentOS, do you believe that would make a huge difference though? Are the newer versions of OpenSwan *that *much more reliable?

On 10 February 2016 at 04:58, Eero Volotinen eero.volotinen@iki.fi wrote:

...

Centos 5 is also a bit old os. Is it possible to use newer version? (like centos 7 or centos 6?)

Eero

2016-02-09 19:52 GMT+02:00 Gordon Messmer gordon.messmer@gmail.com:

...
On 02/09/2016 07:04 AM, John Cenile wrote:

...
does anyone have any suggestions on what the problem might be?

Not off the top of my head, but if I were you, I'd enable debugging of "control" and "dpd". See man ipsec.conf (/plutodebug) and man

ipsec_pluto.

...

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Eero Volotinen

6:48 a.m.

Well. Centos 5 is really near of it's end of life. There is not much updates to kernel or openswan. You should at least try latest openswan version.

Your issue looks like a bit network problem.

-- Eero

2016-02-10 8:34 GMT+02:00 John Cenile jcenile1983@gmail.com:

...

So lowering the keylife / ikelifetime didn't solve the problem. I've enabled debugging and I'll see what it says.

Unfortunately we can't (easily) upgrade CentOS, do you believe that would make a huge difference though? Are the newer versions of OpenSwan *that *much more reliable?

On 10 February 2016 at 04:58, Eero Volotinen eero.volotinen@iki.fi wrote:

...
Centos 5 is also a bit old os. Is it possible to use newer version? (like centos 7 or centos 6?)

Eero

2016-02-09 19:52 GMT+02:00 Gordon Messmer gordon.messmer@gmail.com:

...
On 02/09/2016 07:04 AM, John Cenile wrote:

...
does anyone have any suggestions on what the problem might be?

Not off the top of my head, but if I were you, I'd enable debugging of "control" and "dpd". See man ipsec.conf (/plutodebug) and man

ipsec_pluto.

...

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

John Cenile

11 Feb 11 Feb

6:10 a.m.

As I said though, there's no lost ICMP packets, even when the IPSec tunnel drops out.

I do notice a lot of these errors in the secure log though, would this be any indication of a problem? (I'm grepping for this specific error, they're not the only messages in there).

Feb 11 14:18:10 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x01f90e1d) not found (maybe expired) Feb 11 14:18:14 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xb3681486) not found (maybe expired) Feb 11 14:18:14 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x6ad588f5) not found (maybe expired) Feb 11 14:19:07 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xe05ced4d) not found (maybe expired) Feb 11 14:19:08 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x7cd46e9e) not found (maybe expired) Feb 11 14:19:38 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x07164936) not found (maybe expired) Feb 11 14:19:55 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x9e68c142) not found (maybe expired) Feb 11 14:19:58 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xcbb10063) not found (maybe expired) Feb 11 14:20:16 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x7a160d48) not found (maybe expired) Feb 11 14:20:26 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x18a63776) not found (maybe expired) Feb 11 14:21:11 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x09eb87c4) not found (maybe expired) Feb 11 14:21:11 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xb2438c9b) not found (maybe expired) Feb 11 14:21:15 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x04236e6a) not found (maybe expired) Feb 11 14:21:52 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x456f7468) not found (maybe expired) Feb 11 14:21:57 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x8ee90acd) not found (maybe expired) Feb 11 14:22:04 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xc6676973) not found (maybe expired) Feb 11 14:22:04 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xc3b43142) not found (maybe expired) Feb 11 14:22:30 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x37111e62) not found (maybe expired) Feb 11 14:22:35 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xb6e63098) not found (maybe expired) Feb 11 14:23:24 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xbd94fd66) not found (maybe expired) Feb 11 14:24:05 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x36f47642) not found (maybe expired) Feb 11 14:24:18 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xababea68) not found (maybe expired) Feb 11 14:24:33 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x9088954e) not found (maybe expired) Feb 11 14:24:46 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x5f1ba8d3) not found (maybe expired)

On 10 February 2016 at 17:48, Eero Volotinen eero.volotinen@iki.fi wrote:

...

Well. Centos 5 is really near of it's end of life. There is not much updates to kernel or openswan. You should at least try latest openswan version.

Your issue looks like a bit network problem.

-- Eero

2016-02-10 8:34 GMT+02:00 John Cenile jcenile1983@gmail.com:

...
So lowering the keylife / ikelifetime didn't solve the problem. I've enabled debugging and I'll see what it says.

Unfortunately we can't (easily) upgrade CentOS, do you believe that would make a huge difference though? Are the newer versions of OpenSwan *that *much more reliable?

On 10 February 2016 at 04:58, Eero Volotinen eero.volotinen@iki.fi wrote:

...
Centos 5 is also a bit old os. Is it possible to use newer version?

(like

...
...
centos 7 or centos 6?)

Eero

2016-02-09 19:52 GMT+02:00 Gordon Messmer gordon.messmer@gmail.com:

...
On 02/09/2016 07:04 AM, John Cenile wrote:

...
does anyone have any suggestions on what the problem might be?

Not off the top of my head, but if I were you, I'd enable debugging

of

...
...
...
"control" and "dpd". See man ipsec.conf (/plutodebug) and man

ipsec_pluto.

...

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Gordon Messmer

3:25 p.m.

On 02/10/2016 10:10 PM, John Cenile wrote:

...

I do notice a lot of these errors in the secure log though, would this be any indication of a problem? (I'm grepping for this specific error, they're not the only messages in there).

Feb 11 14:18:10 site-a pluto[10450]: "site-b/1x1" #803: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0x01f90e1d) not found (maybe expired)

I think they indicate that both sides are restarting the tunnel, and that site-b is sending a "delete" command as it restarts the tunnel, while site-a has already removed the tunnel. But that doesn't tell us anything about why they're doing that. Control debugging from both sides *should* make that clear, but you'll have to either make sense of the complete logs or share them.

3471

Age (days ago)

3473

Last active (days ago)

discuss@lists.centos.org

8 comments

3 participants

tags (0)

participants (3)

Eero Volotinen
Gordon Messmer
John Cenile