[Openswan Users] Ipsec auto --up {tunnelname} hangs

Greg Scott GregScott at InfraSupportEtc.com
Thu Jun 19 00:22:46 EDT 2008


Ya know, this thought just hit me.  My tunnel scenario is a little bit
complex.  This particular branch site has 2 tunnels back to the right
side.  One is up all the time, the other is a backup for the MPLS cloud.
Two tunnels because two different sets of users at this remote site.

I am pasting in the log from my script below.  Hopefully, the emailer
won't butcher it.  Anyway, this site has two tunnels, named
JanesvilleCheetah-Everywhere and JanesvillePNT-Everywhere.  The
JanesvillePNT-Everywhere is the failover tunnel.  This is the log from
the copy of the route monitor script running on the right side - the
main site.  

Check out this sequence of events - 

Startup on June 12.

Monday June 16 at 10:50 AM, the MPLS router went offline and a failover
occurred.  This hung and the failover never finished.

At 11:56AM - more than an hour later - I killed the hung ipsec whack
process and my script continued.  

This is where it gets interesting - take a look at the output lines from
ipsec whack in my log below.  I noticed these always show up when I
bring up a tunneI.  Notice the left side (remote branch site) is giving
the right side the wrong tunnel peer ID.  Those ipsec whack messages
don't have time stamps, so I don't know if they were buffered up before
I killed the ipsec whack process, or if they came after I killed it and
the JanesvillePNT tunnel was long gone.  In other words, that tunnel ID
mismatch could be a consequence of me killing the hung ipsec whack
process, or it could be a symptom of the problem.  

What if the left side responds to queries from the right side - but it
responds because it already has a tunnel connecting to the right side,
it's just the wrong tunnel?  So the right side, thinking the left side
tunnel is available, now waits forever for the next answer.  But the
next answer never comes because the PNT tunnel (the failiover tunnel)
went away.

This is just a working hypothesis - does it make sense?

The really bizarre thing is, the left side behaves as expected and the
right side hangs.  It's the exact same script on both sides.  They both
have the same 2 tunnels, they both notice the MPLS router on the other
end goes offline.  

[root at lme-fw log]# cd /var/log
[root at lme-fw log]# cat routemon.log.JanesvillePNT-Everywhere
Thu Jun 12 16:20:46 CDT 2008 starting up route-monitor.sh on lme-fw.
Pinging every 20 seconds.
Setting up the primary path via 192.168.3.97 to 12.115.128.14
Killing any other /firewall-scripts/route-monitor.sh processes for
JanesvillePNT-Everywhere
        Ignore "No such process" errors.
/firewall-scripts/route-monitor.sh: line 133: kill: (3422) - No such
process
The primary routing destination at 12.115.128.14 answers.  Therefore . .
.
        Making sure the tunnel JanesvillePNT-Everywhere is down.  Ignore
errors.
021 no connection named "JanesvillePNT-Everywhere"
021 no connection named "JanesvillePNT-Everywhere"
Initialization complete - starting loop.
Mon Jun 16 10:50:28 CDT 2008 Primary path 12.115.128.14 is offline.
Calling assume_primary
Mon Jun 16 10:50:28 CDT 2008 lme-fw 12.115.128.14 is offline.  Bringing
up tunnel JanesvillePNT-Everywhere.
You must specify direct recipients with -s, -c, or -b.
sh: line 4: 23750 Killed                  ipsec whack --name
JanesvillePNT-Everywhere --initiate
104 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I1: initiate
003 "JanesvillePNT-Everywhere" #530: ignoring unknown Vendor ID payload
[4f455f5d7b764b67436f4f49]
003 "JanesvillePNT-Everywhere" #530: received Vendor ID payload [Dead
Peer Detection]
003 "JanesvillePNT-Everywhere" #530: received Vendor ID payload [RFC
3947] method set to=110
106 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I2: sent MI2, expecting
MR2
003 "JanesvillePNT-Everywhere" #530: NAT-Traversal: Result using 3: no
NAT detected
108 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I3: sent MI3, expecting
MR3
003 "JanesvillePNT-Everywhere" #530: we require peer to have ID
'@janesvillepnt.local', but peer declares '@janesvillecheetah.local'
218 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I3:
INVALID_ID_INFORMATION
Wed Jun 18 11:56:44 CDT 2008 lme-fw primary path 12.115.128.14 is now
answering; taking down tunnel JanesvillePNT-Everywhere.
You must specify direct recipients with -s, -c, or -b.
Taking down the tunnel JanesvillePNT-Everywhere
[root at lme-fw log]#


- Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openswan.org/pipermail/users/attachments/20080618/ff8d899f/attachment.html 


More information about the Users mailing list