[Openswan Users] Unstable behavior with 2 tunnels connecting the same sites
Greg Scott
GregScott at Infrasupport.com
Wed Jul 14 11:41:49 EDT 2010
Something unhealthy is going on with configs that have multiple tunnels
connecting the same sites.
I know I always end up posting the weird problems and here's another
one. I have a customer with 2 sites, called HQ and colo. HQ is on the
right, colo on the left. The HQ site has 2 LANS - 175.10/16 and
175.7/16. The colo site also has 2 LANS, 175.8/16 and 175.9/16. I
supernetted the tunnels at the colo site to 175.8/15 as a
troubleshooting step and also a way to reduce the number of tunnels from
4 to 2. I know this setup is a little off the beaten path, but this
customer needs multiple tunnels connecting the same sites to make their
storage replication work properly.
Every once-in-a-while, one or more of these tunnels decides to go out to
lunch. This is usually when there's a telcom interruption. IPSEC is
supposed to hook both sites back up after the telecom comes back online,
but this doesn't always work here. The only solution is to manually
restart ipsec on one side or the other.
So this morning, I had an outage and sure enough, half the tunnels
weren't answering. So I tried service ipsec restart at the HQ site and
. . . it hung. Yup, it hung. I would love to prove that it hung, but
the putty output is already scrolled off the top of the window. But I
was there, I saw it with my own eyes, it hung. Trust me, it hung.
Fwiw, I've seen this hang before with multiple tunnels. It's been going
on for years in one form or another and I've posted references to it in
this forum.
After pressing Ctrl/C, I tried sh -v /etc/rc.d/init.d/ipsec restart -
this worked properly and now everyone can see everyone else.
When the problem is happening, I see lots of messages coming into
/var/log/secure. Here is a sample:
[root at stylmark-fw1 ipsec.d]# more greg2.txt
Jul 14 08:00:00 localhost pluto[23465]: initiate on demand from
175.10.0.1:8 to 175.9.1.35:0 proto=1 state: fos_start be
cause: acquire
Jul 14 08:00:00 localhost pluto[23465]: "colo-hqmain" #212624:
initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2A
LLOW {using isakmp#212615 msgid:d98e9c48 proposal=defaults
pfsgroup=OAKLEY_GROUP_MODP2048}
Jul 14 08:00:00 localhost pluto[23465]: "colo-hqmain" #212624:
transition from state STATE_QUICK_I1 to state STATE_QUICK
_I2
Jul 14 08:00:00 localhost pluto[23465]: "colo-hqmain" #212624:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mod
e {ESP=>0x86d6e4be <0x68544fa4 xfrm=AES_128-HMAC_SHA1 NATOA=none
NATD=none DPD=none}
Jul 14 08:00:03 localhost pluto[23465]: initiate on demand from
175.10.0.1:8 to 175.8.1.101:0 proto=1 state: fos_start b
ecause: acquire
Jul 14 08:00:03 localhost pluto[23465]: "colo-hqmain" #212625:
initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2A
LLOW {using isakmp#212615 msgid:d31345ba proposal=defaults
pfsgroup=OAKLEY_GROUP_MODP2048}
Jul 14 08:00:03 localhost pluto[23465]: "colo-hqmain" #212625:
transition from state STATE_QUICK_I1 to state STATE_QUICK
_I2
Jul 14 08:00:03 localhost pluto[23465]: "colo-hqmain" #212625:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mod
e {ESP=>0xb35a6fc7 <0xac2386d4 xfrm=AES_128-HMAC_SHA1 NATOA=none
NATD=none DPD=none}
Jul 14 08:00:09 localhost pluto[23465]: initiate on demand from
175.10.0.35:8 to 175.9.1.35:0 proto=1 state: fos_start b
ecause: acquire
Jul 14 08:00:09 localhost pluto[23465]: "colo-hqmain" #212626:
initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2A
LLOW {using isakmp#212615 msgid:b005937f proposal=defaults
pfsgroup=OAKLEY_GROUP_MODP2048}
Jul 14 08:00:09 localhost pluto[23465]: "colo-hqmain" #212626:
transition from state STATE_QUICK_I1 to state STATE_QUICK
_I2
Jul 14 08:00:09 localhost pluto[23465]: "colo-hqmain" #212626:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mod
e {ESP=>0x364780e1 <0x58c0d1e0 xfrm=AES_128-HMAC_SHA1 NATOA=none
NATD=none DPD=none}
Jul 14 08:00:28 localhost pluto[23465]: "colo-hqmain" #212615: received
Delete SA(0x7c705344) payload: deleting IPSEC St
ate #209204
Jul 14 08:00:28 localhost pluto[23465]: "colo-hqmain" #212615: received
and ignored informational message
Jul 14 08:00:31 localhost pluto[23465]: "colo-hqmain" #212615: ignoring
Delete SA payload: PROTO_IPSEC_ESP SA(0x8b2781f0
) not found (maybe expired)
Jul 14 08:00:31 localhost pluto[23465]: "colo-hqmain" #212615: received
and ignored informational message
Jul 14 08:00:34 localhost pluto[23465]: "colo-hqmain" #212615: received
Delete SA(0xf8a2d8fb) payload: deleting IPSEC St
ate #209206
Jul 14 08:00:34 localhost pluto[23465]: "colo-hqmain" #212615: received
and ignored informational message
Jul 14 08:00:37 localhost pluto[23465]: "colo-hqmain" #212615: received
Delete SA(0x14029340) payload: deleting IPSEC St
ate #209207
Jul 14 08:00:37 localhost pluto[23465]: "colo-hqmain" #212615: received
and ignored informational message
Jul 14 08:00:38 localhost pluto[23465]: initiate on demand from
175.10.0.1:8 to 175.9.1.1:0 proto=1 state: fos_start bec
ause: acquire
Jul 14 08:00:38 localhost pluto[23465]: "colo-hqmain" #212627:
initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2A
LLOW {using isakmp#212615 msgid:3e7351ff proposal=defaults
pfsgroup=OAKLEY_GROUP_MODP2048}
Jul 14 08:00:39 localhost pluto[23465]: "colo-hqmain" #212627:
transition from state STATE_QUICK_I1 to state STATE_QUICK
_I2
Jul 14 08:00:39 localhost pluto[23465]: "colo-hqmain" #212627:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mod
e {ESP=>0x19427699 <0x043fa1d4 xfrm=AES_128-HMAC_SHA1 NATOA=none
NATD=none DPD=none}
Jul 14 08:00:41 localhost pluto[23465]: initiate on demand from
175.10.0.1:8 to 175.8.1.254:0 proto=1 state: fos_start b
ecause: acquire
--More--(0%)
And here is a sample from /var/log/secure when things are working
properly - I dummied up references to public IP Addresses:
[root at stylmark-fw1 ipsec.d]# tail /var/log/secure -f
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #1: the peer
proposed: 175.10.0.0/16:0/0 -> 175.8.0.0/15:0/0
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31: responding to
Quick Mode proposal {msgid:6a8b3c68}
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31: us:
175.10.0.0/16===1.2.42.85<1.2.42.85>[@hqmain,+S=C]---1.2.42.86
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31: them:
3.4.64.174---3.4.64.169<3.4.64.169>[@colo,+S=C]===175.8.0.0/15
Jul 14 10:33:34 localhost pluto[3993]: | NAT-OA: 0 tunnel: 0
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31: keeping
refhim=4294901761 during rekey
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31: transition
from state STATE_QUICK_R0 to state STATE_QUICK_R1
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31:
STATE_QUICK_R1: sent QR1, inbound IPsec SA installed, expecting QI2
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31: transition
from state STATE_QUICK_R1 to state STATE_QUICK_R2
Jul 14 10:33:34 localhost pluto[3993]: "colo-hqmain" #31:
STATE_QUICK_R2: IPsec SA established tunnel mode {ESP=>0x8fd8f76b
<0xaf448d32 xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=none DPD=none}
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #1: the peer
proposed: 175.10.0.0/16:0/0 -> 175.8.0.0/15:0/0
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32: responding to
Quick Mode proposal {msgid:bcf600d5}
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32: us:
175.10.0.0/16===1.2.42.85<1.2.42.85>[@hqmain,+S=C]---1.2.42.86
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32: them:
3.4.64.174---3.4.64.169<3.4.64.169>[@colo,+S=C]===175.8.0.0/15
Jul 14 10:35:34 localhost pluto[3993]: | NAT-OA: 0 tunnel: 0
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32: keeping
refhim=4294901761 during rekey
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32: transition
from state STATE_QUICK_R0 to state STATE_QUICK_R1
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32:
STATE_QUICK_R1: sent QR1, inbound IPsec SA installed, expecting QI2
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32: transition
from state STATE_QUICK_R1 to state STATE_QUICK_R2
Jul 14 10:35:34 localhost pluto[3993]: "colo-hqmain" #32:
STATE_QUICK_R2: IPsec SA established tunnel mode {ESP=>0x0c7f39bf
<0x2e95afcb xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=none DPD=none}
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #1: the peer
proposed: 175.10.0.0/16:0/0 -> 175.8.0.0/15:0/0
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33: responding to
Quick Mode proposal {msgid:521ce545}
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33: us:
175.10.0.0/16===1.2.42.85<1.2.42.85>[@hqmain,+S=C]---1.2.42.86
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33: them:
3.4.64.174---3.4.64.169<3.4.64.169>[@colo,+S=C]===175.8.0.0/15
Jul 14 10:35:47 localhost pluto[3993]: | NAT-OA: 0 tunnel: 0
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33: keeping
refhim=4294901761 during rekey
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33: transition
from state STATE_QUICK_R0 to state STATE_QUICK_R1
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33:
STATE_QUICK_R1: sent QR1, inbound IPsec SA installed, expecting QI2
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33: transition
from state STATE_QUICK_R1 to state STATE_QUICK_R2
Jul 14 10:35:47 localhost pluto[3993]: "colo-hqmain" #33:
STATE_QUICK_R2: IPsec SA established tunnel mode {ESP=>0xc0136c4b
<0x32bf7674 xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=none DPD=none}
This is the version of Openswan running at the HQ site:
[root at stylmark-fw1 firewall-scripts]# ipsec version
Linux Openswan U2.6.25/K2.6.32.12-115.fc12.i686.PAE (netkey)
See `ipsec --copyright' for copyright information.
[root at stylmark-fw1 firewall-scripts]#
And this is the version running at the colo site:
[root at colo-fw firewall-scripts]# ipsec version
Linux Openswan U2.6.25/K2.6.17.2fw21 (netkey)
See `ipsec --copyright' for copyright information.
[root at colo-fw firewall-scripts]#
As you can see, the colo site has an older kernel but a new version of
Openswan.
Here are the conn definitions. First, colo-ipsec.conf at the colo site.
Note the commented out additional tunnels at the bottom. I supernetted
the conn definitions at the colo site as a troubleshooting step:
conn colo-hqmain
type=tunnel
#
# Left security gateway, subnet behind it, next hop toward left.
#
also=colo
#
# Right security gateway, subnet behind it, next hop toward
left.
#
also=hqmain
auto=start
conn colo-hqmirror
type=tunnel
#
# Left security gateway, subnet behind it, next hop toward left.
#
also=colo
#
# Right security gateway, subnet behind it, next hop toward
left.
#
also=hqmirror
auto=start
##conn colomirror-hqmirror
## type=tunnel
## #
## # Left security gateway, subnet behind it, next hop toward left.
## #
## also=colomirror
## #
## # Right security gateway, subnet behind it, next hop toward
left.
## #
## also=hqmirror
## auto=start
##conn colomirror-hqmain
## type=tunnel
## #
## # Left security gateway, subnet behind it, next hop toward left.
## #
## also=colomirror
## #
## # Right security gateway, subnet behind it, next hop toward
left.
## #
## also=hqmain
## auto=start
include /etc/ipsec.d/sites.conf
Next are the conn definitions from hq-ipsec.conf:
conn colo-hqmain
type=tunnel
#
# Left security gateway, subnet behind it, next hop toward left.
#
also=colo
#
# Right security gateway, subnet behind it, next hop toward
left.
#
also=hqmain
auto=start
conn colo-hqmirror
type=tunnel
#
# Left security gateway, subnet behind it, next hop toward left.
#
also=colo
#
# Right security gateway, subnet behind it, next hop toward
left.
#
also=hqmirror
auto=start
##conn colomirror-hqmirror
## type=tunnel
## #
## # Left security gateway, subnet behind it, next hop toward left.
## #
## also=colomirror
## #
## # Right security gateway, subnet behind it, next hop toward
left.
## #
## also=hqmirror
## auto=start
##conn colomirror-hqmain
## type=tunnel
## #
## # Left security gateway, subnet behind it, next hop toward left.
## #
## also=colomirror
## #
## # Right security gateway, subnet behind it, next hop toward
left.
## #
## also=hqmain
## auto=start
include /etc/ipsec.d/sites.conf
And finally, sites.conf, which contains the IP Addresses of all sites.
Each site has an identical copy of sites.conf. Public IP Addresses are
dummied up and RSA keys truncated.
conn hqmain
right=1.2.42.85
rightsubnet=175.10.0.0/16
rightnexthop=1.2.42.86
rightsourceip=175.10.0.1
rightid=@hqmain
### rightupdown=/etc/ipsec.d/hq-updown.sh
# rsakey AQOkh1tMU
rightrsasigkey=0sAQOkh...
conn hqmirror
right=1.2.42.85
rightsubnet=175.7.0.0/16
rightnexthop=1.2.42.86
rightsourceip=175.7.0.1
rightid=@hqmirror
# rsakey AQOkh1tMU
rightrsasigkey=0sAQOkh1t...
conn colo
left=3.4.64.169
leftsubnet=175.8.0.0/15
leftnexthop=3.4.64.174
leftsourceip=175.9.1.1
leftid=@colo
# RSA 2192 bits colo-fw Wed Nov 29 19:08:25 2006
leftrsasigkey=0sAQOSwRcj...
##conn colomirror
## left=3.4.64.169
## leftsubnet=175.8.0.0/16
## leftnexthop=3.4.64.174
## ##leftid=@colomirror
## # RSA 2192 bits colo-fw Wed Nov 29 19:08:25 2006
## leftrsasigkey=0sAQOSwR...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openswan.org/pipermail/users/attachments/20100714/5c7e433b/attachment-0001.html
More information about the Users
mailing list