[Openswan Users] OpenSWAN, KLIPS, and dead tunnels

Wed Oct 7 11:50:50 EDT 2009

Hello, all!

I have two sites connected with a VPN that's very stable.  In each node
(linux servers) I have a DNS that points to the one in the other node
(this structure is inevitable due to how we have our domains set up). 
This all works perfectly when the tunnel is up, and even when it isn't -
the only consequence when the tunnel isn't up is that querying names
from the other side comes up empty.  Otherwise, everything works fine -
name resolution for the internet, and whatnot.

However, if the tunnel is up and one of the nodes disappears - due to an
outage, machine crash, whatever - then I have a problem that I can't
really find a solution for: the tunnel is dead, but the KLIPS policies
still remain in place.  This means that any attempt to cross the tunnel
will hang waiting for packet timeout (since the policy states that an
attempt should be made to encrypt the traffic and whatnot).  Sadly, as I
mentioned, bind (DNS) is configured to do just that - reach out over the
tunnel to the other DNS.  When that happens and the tunnel is crashed
(i.e. one of the nodes just disappeared), and due to the stale policies
in place, and also due to how bind appears to be built, any DNS query
will result in a hang because bind is unable to reach over the tunnel or
- at least -  have the attempt rejected promptly (i.e. no route to host,
etc).

The only solution is to manually jump into the box and restart the IPSec
service, forcing the policies to be taken down, to be re-added when the
tunnel is back up.  This is manageable, but less than ideal.

My perception of how this should really function is that when the peer
is found to be down (we do have DPD configured on both ends so this
should be done already), then the policies for KLIPS should be removed
automaticall - just as they were added automatically when the tunnel was
first initiated - until the tunnel can be brought back up.  This would
eliminate the problem described.

However, this isn't happening and I'm not sure if it's due to
misconfiguration (perhaps I should use dpdaction=clear instead of
restart_by_peer?), or due to a software defect in OpenSWAN.  Any
insights/comments on the matter?  Below is the config from one of the
nodes - the other node's config is a mirror image of this:

----- BEGIN CONF -----
version 2.0
config setup
        interfaces="%none"
        virtual_private=%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12
        oe=off
        protostack=netkey
conn vpn-peer
        type=tunnel
        leftupdown="/etc/openswan/updown"
        left=<left-public-ip>
        leftsourceip=<left-private-ip>
        leftsubnets={<left-subnets>}
        leftrsasigkey=<LEFT-RSA-PUB-KEY>
        right=<right-public-ip>
        rightsubnets={<right-subnets>}
        rightrsasigkey=<RIGHT-RSA-PUB-KEY>
        dpdaction=restart_by_peer
        dpddelay=30
        dpdtimeout=60
        pfs=yes
        ike=3des-sha1
        esp=3des-sha1
        auto=start
        salifetime=1h
        ikelifetime=24h
        rekeymargin=2m
------ END CONF ------

As I mentioned, the tunnel is working perfectly except for this scenario
in case of a sudden fault (machine crash, network outage, whatever).

Cheers!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 267 bytes
Desc: OpenPGP digital signature
Url : http://lists.openswan.org/pipermail/users/attachments/20091007/fe722188/attachment.bin