<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7638.1">
<TITLE>RE: [Openswan Users] Ipsec auto --up {tunnelname} hangs</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Ya know, this thought just hit me. My tunnel scenario is a little bit complex. This particular</FONT> <FONT SIZE=2 FACE="Arial">branch site</FONT> <FONT SIZE=2 FACE="Arial">has 2 tunnels</FONT><FONT SIZE=2 FACE="Arial"> back to the right side</FONT><FONT SIZE=2 FACE="Arial">. One is up all the time, the other is a backup for the MPLS cloud. Two tunnels because two different sets of users at this remote site.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">I am pasting in the log from my script below. Hopefully</FONT><FONT SIZE=2 FACE="Arial">, the emailer won't butcher it. Anyway, this site has two tunnels, named JanesvilleCheetah-Everywhere and JanesvillePNT-Everywhere. The JanesvillePNT-Everywhere is the failover tunnel. This is the log from the copy of the route monitor script running on the right side - the main site. </FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Check out this sequence of events - </FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Startup on June 12.</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Monday June 16 at 10:50 AM, the MPLS router went offline and a failover occurred. This hung and the failover never finished.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">At 11:56AM - more than an hour later - I killed the hung ipsec whack process and my script continued. </FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">This is where it gets interesting - take a look at the output lines from ipsec whack in my log below. I noticed these always show up when I bring up a tunneI. Notice the left side (remote branch site) is giving the right side the wrong tunnel peer ID. Those ipsec whack messages don’t have time stamps, so I don't know if they were buffered up before I killed the ipsec whack process, or if they came after I killed it and the JanesvillePNT tunnel was long gone. In other words, that tunnel ID mismatch could be a consequence of me killing the hung ipsec whack process, or it could be a symptom of the problem. </FONT> </SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">What if the left side responds to queries from the right side - but it responds because it already has a tunnel connecting to the right side, it's just the wrong tunnel? So the right side, thinking the left side tunnel is available, now waits forever for the next answer. But the next answer never comes because the PNT tunnel (the failiover tunnel) went away.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">This is just a working hypothesis - does it make sense?</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">The really bizarre thing is, the left side behaves as expected and the right side hangs. It's the exact same script on both sides. They both have the same 2 tunnels, they both notice the MPLS router on the other end goes offline. </FONT> </SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">[root@lme-fw log]# cd /var/log</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">[root@lme-fw log]# cat routemon.log.JanesvillePNT-Everywhere</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Thu Jun 12 16:20:46 CDT 2008 starting up route-monitor.sh on lme-fw. Pinging every 20 seconds.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Setting up the primary path via 192.168.3.97 to 12.115.128.14</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Killing any other /firewall-scripts/route-monitor.sh processes for JanesvillePNT-Everywhere</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial"> Ignore "No such process" errors.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">/firewall-scripts/route-monitor.sh: line 133: kill: (3422) - No such process</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">The primary routing destination at 12.115.128.14 answers. Therefore . . .</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial"> Making sure the tunnel JanesvillePNT-Everywhere is down. Ignore errors.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">021 no connection named "JanesvillePNT-Everywhere"</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">021 no connection named "JanesvillePNT-Everywhere"</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Initialization complete - starting loop.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Mon Jun 16 10:50:28 CDT 2008 Primary path 12.115.128.14 is offline. Calling assume_primary</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Mon Jun 16 10:50:28 CDT 2008 lme-fw 12.115.128.14 is offline. Bringing up tunnel JanesvillePNT-Everywhere.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">You must specify direct recipients with -s, -c, or -b.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">sh: line 4: 23750 Killed ipsec whack --name JanesvillePNT-Everywhere --initiate</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">104 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I1: initiate</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">003 "JanesvillePNT-Everywhere" #530: ignoring unknown Vendor ID payload [4f455f5d7b764b67436f4f49]</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">003 "JanesvillePNT-Everywhere" #530: received Vendor ID payload [Dead Peer Detection]</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">003 "JanesvillePNT-Everywhere" #530: received Vendor ID payload [RFC 3947] method set to=110</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">106 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I2: sent MI2, expecting MR2</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">003 "JanesvillePNT-Everywhere" #530: NAT-Traversal: Result using 3: no NAT detected</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">108 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I3: sent MI3, expecting MR3</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">003 "JanesvillePNT-Everywhere" #530: we require peer to have ID '@janesvillepnt.local', but peer declares '@janesvillecheetah.local'</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">218 "JanesvillePNT-Everywhere" #530: STATE_MAIN_I3: INVALID_ID_INFORMATION</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Wed Jun 18 11:56:44 CDT 2008 lme-fw primary path 12.115.128.14 is now answering; taking down tunnel JanesvillePNT-Everywhere.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">You must specify direct recipients with -s, -c, or -b.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Taking down the tunnel JanesvillePNT-Everywhere</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">[root@lme-fw log]#</FONT></SPAN>
</P>
<BR>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">- Greg</FONT></SPAN>
</P>
</BODY>
</HTML>