[Openswan Users] Lots of %hold connections

Michael Smith msmith at cbnco.com
Tue Sep 20 03:25:41 CEST 2005


Hi,

I have a test setup with 19 clients and a central location all running 
kernel 2.6.11.11 with Openswan 2.4.0. They are all just 486-class Soekris 
net4501s. There's a server in a /24 subnet behind the central router, and 
all the clients have their own /32 subnets on the VPN.

On the clients I have a pretty pathological test application - if it can't 
connect, it retries once per second. Every thirty seconds or so, this 
results in a new bare shunt being added on the clients, if the central is 
down:

000 w.x.y.16/32:0 -6-> a.b.c.55/32:0 => %hold 0    %acquire-netlink

That "6" is TCP. These pile up on the clients - I've seen up to 500. They 
seem to each trigger a quick mode initiation once the client is able to 
complete main mode. 19 clients times 500 quick mode initiations really 
bogs down the central router :)

The narrow bare shunts are supposed to be replaced with broader subnet 
shunts from the IPsec policy, e.g. w.x.y.16/32:0 -0-> a.b.c.0/24. The 
trouble is that record_and_initiate_opportunistic() puts the transport 
protocol - 6 - in the bare shunt, but initiate_opportunistic() sets the 
transport protocol to 0 when it creates the broad %hold, so the broad 
%hold doesn't replace the narrow one. A workaround is to set 
transport_proto to 0 at the top of record_and_initiate_opportunistic():

--- programs/pluto/kernel.c	15 Sep 2005 18:17:21 -0000	1.1.1.2
+++ programs/pluto/kernel.c	20 Sep 2005 06:22:11 -0000
@@ -176,6 +176,13 @@
                                   , int transport_proto
                                   , const char *why)
 {
+    /*
+     * initiate_opportunistic() sets its transport proto to 0, so we
+     * must do the same when creating the bare shunt; otherwise the narrow
+     * shunt won't be deleted when a broad hold pops up.
+     */
+    transport_proto = 0;
+
     passert(samesubnettype(ours, his));
 
     /* Add to bare shunt list.

 

But then I get a lot of "Queuing pending Quick Mode with ...". These pile 
up just like the %holds and trigger a quick mode flood whenever the 
central router comes back to life. So in add_pending(), I had to check if 
any pre-existing penders exist and replace them:

--- programs/pluto/pending.c	30 May 2005 15:19:14 -0000	1.1.1.1
+++ programs/pluto/pending.c	20 Sep 2005 06:22:11 -0000
@@ -69,6 +69,37 @@
     struct pending *next;
 };
 
+static void
+delete_pending(struct pending **pp);
+
+static void delete_old_pending(const struct connection *c,
+			       const struct pending *match)
+{
+    struct pending *p, **pp;
+
+    pp = host_pair_first_pending(c);
+    if(pp == NULL) return;
+
+    while ((p = *pp) != NULL)
+    {
+    	if (p->isakmp_sa == match->isakmp_sa
+	    && p->connection == match->connection
+	    && p->policy == match->policy)
+	{
+	    DBG(DBG_CONTROL, DBG_log("Deleting existing pending state from %d."
+	        , p->pend_time));
+
+	    p->connection = NULL;
+	    delete_pending(pp);
+	}
+    	else
+	{
+    	    pp = &p->next;
+	}
+    }
+}
+
+
 /* queue a Quick Mode negotiation pending completion of a suitable Main Mode */
 void
 add_pending(int whack_sock
@@ -91,6 +122,8 @@
     p->replacing = replacing;
     p->pend_time = time(NULL);
 
+    delete_old_pending(c, p);
+
     host_pair_enqueue_pending(c, p, &p->next);
 }


I guess I should really be able to break after that call to 
delete_pending(), if there is no other way of adding multiple identical 
pending entries to the queue.

With those patches applied, my clients aren't hammering the central router 
anymore, and they can successfully build SAs.

I am including /etc/no_oe.conf, so I don't think this is opportunism 
related, although all the functions have opportunistic in their names.

Mike


More information about the Users mailing list