[Openswan Users] Two tunnels between the same hosts; one works, the other works sometimes

Thu Jun 10 01:44:21 EDT 2010

OK - I built 2.6.25 from source on the colo site and did service ipsec
restart on both sites.  Now everyone can ping everyone.  Go figure.

From: users-bounces at openswan.org [mailto:users-bounces at openswan.org] On
Behalf Of Greg Scott
Sent: Thursday, June 10, 2010 12:27 AM
To: users at openswan.org
Subject: Re: [Openswan Users] Two tunnels between the same hosts;one
works, the other works sometimes

Oh yes - and watching /var/log/secure -f, I see lots of SA Established
messages, generally followed by a bunch of other messages.  Here is a
sample:

Jun 10 00:23:39 localhost pluto[32341]: "colo-hqmain" #94:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode
{ESP=>0xbc7fa7b2 <0x8960d6a9 xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=none
DPD=none}

Jun 10 00:23:40 localhost pluto[32341]: initiate on demand from
175.7.0.254:8 to 175.8.1.254:0 proto=1 state: fos_start because: acquire

Jun 10 00:23:40 localhost pluto[32341]: "colo-hqmirror" #95: initiating
Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW {using isakmp#19
msgid:43388045 proposal=defaults pfsgroup=OAKLEY_GROUP_MODP2048}

Jun 10 00:23:40 localhost pluto[32341]: "colo-hqmirror" #95: transition
from state STATE_QUICK_I1 to state STATE_QUICK_I2

Jun 10 00:23:40 localhost pluto[32341]: "colo-hqmirror" #95:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode
{ESP=>0xe1de3dcb <0xe9e58a27 xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=none
DPD=none}

Jun 10 00:24:18 localhost pluto[32341]: initiate on demand from
175.10.0.1:8 to 175.9.1.1:0 proto=1 state: fos_start because: acquire

Jun 10 00:24:18 localhost pluto[32341]: "colo-hqmain" #96: initiating
Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW {using isakmp#10
msgid:8b21c369 proposal=defaults pfsgroup=OAKLEY_GROUP_MODP2048}

Jun 10 00:24:18 localhost pluto[32341]: "colo-hqmain" #96: transition
from state STATE_QUICK_I1 to state STATE_QUICK_I2

Jun 10 00:24:18 localhost pluto[32341]: "colo-hqmain" #96:
STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode
{ESP=>0x624f259e <0x0a8b7ec8 xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=none
DPD=none}

From: users-bounces at openswan.org [mailto:users-bounces at openswan.org] On
Behalf Of Greg Scott
Sent: Thursday, June 10, 2010 12:17 AM
To: users at openswan.org
Subject: [Openswan Users] Two tunnels between the same hosts;one works,
the other works sometimes

Here we go again....

I have two sites named HQ and colo.  HQ is on the right, colo is on the
left.  The HQ site has two LANS; 175.10.0.0/16 and 175.7.0.0/16.   The
colo site also has two LANS, 175.8.0.0/16 and 175.9.0.0/16.  To simplify
the tunnel setup, I supernetted the colo site, so now it's 175.8.0.0/15.

So by my count, I need 2 tunnels:

Colo-hqmain

Colo-hqmirror

Colo-hqmain generally comes up and works reliably.  Colo-hqmirror has
problems.  Sometimes both tunnels will come up, other times one or the
other works.  Sometimes after 10-15 minutes, they will both come up with
each other.  

I tested all this in a simulated environment and naturally it worked
well here.  Of course, now it's flakey in production.  The HQ site is
using Openswan 2.6.25 with Fedora 12.  The colo site is older and uses
Openswan 2.4.4 with Fedora Core 5.  

Why two tunnels to the same sites?  Well, some Storagetek devices that
mirror each other need NICs in different subnets.

Here are some more bizarre symptoms.  All colo subnets can ping all HQ
subnets.  However, only some subnets from HQ can ping some colo subnets,
and this seems to change with the passage of time.  For example, a few
minutes ago, the 175.10 subnet could ping everything in the colo site.
But when the 175.7 subnet tried to ping anything in the colo site, the
pings returned "Operation not permitted".   Now 175.10 can ping 175.8
and 175.7 can ping 175.9.  But 175.7 cannot ping 175.8 and a 175.10
cannot ping 175.9.  That's from the HQ site.  When pings come from the
colo site, all pings work.  Try keeping that straight.  

One more complicating factor.  The HQ site has 2 nodes that act together
in an active/standby pair.  Both nodes have identical configurations
right down to the MAC Addresses on all the NICs.   I ran through several
failovers in my testing here and all worked fine.  I used the real HQ
nodes and a simulated Internet and simulated colo site.  But now in
production, this flakey behavior shows itself.  

I guess maybe I'll try to build an openswan-2.6.25 from the .tar file on
the colo site and maybe it will behave a little better.  Any other
thoughts?

Thanks

-          Greg Scott

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openswan.org/pipermail/users/attachments/20100610/972c10e5/attachment-0001.html