[Openswan dev] Patch for review

Thu Apr 22 09:40:39 EDT 2010

David McCullough wrote:
> Jivin Tuomo Soini lays it down ...
>> Does this break klips?
> 
> Almost certainly,  but I haven't tried it yet :-)
> 
>> It seem to work perfectly on netkey.
>>
>> This is not complete fix yet, Real fixing requires removing checking of
>> initiate_ondemand_body return codes, This just hardcodes return code so
>> it works and returns code to the state it was before all my experiments.
> 
> Just restating that I haven't tried this patch,  but it seems unlikely to
> work for KLIPS.  The bug we had with klips was that if we didn't do anything
> with the tunnel in initiate_ondemand,  then the kernel installed bare_shunt
> would stay and cause problems.  Always removing the bare_shnut is just as
> likely to cause problems later when we expect it to be there.  The klips ops
> are very unforgiving,  if you call update,  it has to exist,  if you call
> create it must not exist,  if you call delete,  it must exist.

That's the netkey problem. Bare shunt is always left behind on same
situations and that doesn't work.

> Would it be possible for you to explain the bug you are trying to fix
> when running with netkey ?  Is it for demand tunnels or not ? What is going
> wrong and when ?

On netkey every run where initiate_ondemand_body leave eroute into xfrm
policy causing packets never flow for the conn causing on-demand tunnel
negotiation.

This happens for both CK_INSTANCE and CK_PERMANENT state tunnels if
there is acquire by kernel.

Even with my fixes there usually is tunnel negotiation and immidiate
rekey but that can't be avoided without code properly checking if there
already is ongoing negotiation and ignoring it.

> Comments below on how the patch can easily be made klips safe if that is the
> right thing to do.
> 
>> commit 42e9ba4cd79187d0fe32ae6c1e29cf7e2ef7f916
>> Author: Tuomo Soini <tis at foobar.fi>
>> Date:   Thu Apr 22 12:42:43 2010 +0300
>>
>>     This commit reverts changes which did fix spurious tunnel rekeyings
>>     because of netkey acquires. But unlike those work-arounds this
>>     change removes reason for those commits by always returning with 0 from
>>     initiate_ondemand_body. And those reverted changes broke on-demand tunneling
>>     by %hold state.
>>     
>>     If we return with non-0 value bare shunts are left behind and never
>>     cleaned away causing invalid hold eroutes on netkey.
>>
>> diff --git a/programs/pluto/initiate.c b/programs/pluto/initiate.c
>> index e2e1c96..9d828ac 100644
>> --- a/programs/pluto/initiate.c
>> +++ b/programs/pluto/initiate.c
>> @@ -717,7 +717,7 @@ initiate_ondemand_body(struct find_oppo_bundle *b
>>      int hisport;
>>      char demandbuf[256];
>>      bool loggedit = FALSE;
>> -    int work = 1; /* assume we did some */
>> +    int work = 0; /* This is ugly fix, we just always return 0 - Tuomo */
> 
> How about:
> 
> 	/* on klips/mast assume we will do something something */
> 	work = (kern_interface == USE_KLIPS || kern_interface == USE_MASTKLIPS);

That's ok for me.

>>      
>>      /* What connection shall we use?
>>       * First try for one that explicitly handles the clients.
>> @@ -780,15 +780,19 @@ initiate_ondemand_body(struct find_oppo_bundle *b
>>  
>>  	if(c->kind == CK_INSTANCE)
>>  	{
>> +	    char cib[CONN_INST_BUF];
>>  	    /* there is already an instance being negotiated, do nothing */
>> -	    return 0;
>> +	    openswan_log("rekeying existing instance \"%s\"%s, due to acquire"
>> +			 , c->name
>> +			 , (fmt_conn_instance(c, cib), cib));
>> +
>> +	    /*
>> +	     * we used to return here, but rekeying is a better choice. If we
>> +	     * got the acquire, it is because something turned stuff into a
>> +	     * %trap, or something got deleted, perhaps due to an expiry.
>> +	     */
> 
> 		work = 0; /* klips/mast need to know we are doing nothing */
> 
>>  	}
>>  
>> -	if(c->kind == CK_PERMANENT)
>> -	{
>> -	    /* there is already a tunnel, do nothing */
>> -	    return 0;
> 
> Restore this, with the work = 0 and for netkey ot will be a NOOP.

We can't restore this. If we return here we never initiate on-demand on
netkey.

> 		work = 0; /* klips/mast need to know we are doing nothing */

You didn't check end of the function. Using return 0 is same as setting
work = 0;

> 
>> -	}
>>  	/* otherwise, there is some kind of static conn that can handle
>>  	 * this connection, so we initiate it */

If we ever return with non-zero status eg work != 0 netkey leaves hold
eroute behind and those are not cleaned away.

-- 
Tuomo Soini <tis at foobar.fi>
Foobar Linux services
+358 40 5240030
Foobar Oy <http://foobar.fi/>