[Openswan Users] disconnect causes failure
Bob Miller
bob at computerisms.ca
Fri Aug 13 20:21:38 EDT 2010
Hi,
Please excuse the reply to myself.
I manually ripped out everything to do with openswan, and reinstalled
clean, just in case I had a bad something. I also received a suggestion
to try reky=no in my config, that did not help either.
I have still not solved this problem, but I have narrowed the issue down
some more. The problem appears to be with the xfrm policy. It seems
that when the connection is taken down the corresponding xfrm policy
remains intact, and so long as that policy remains, I cannot make any
more connections to openswan. I can now resolve the problem without
`ipsec setup reload` by using `ip xfrm policy deleteall` instead.
I came to discover this after chasing this line in my logs:
Aug 13 16:26:09 yq-firewall pluto[14187]: "l2tp-yq"[10] 76.9.58.207 #10:
ERROR: netlink XFRM_MSG_DELPOLICY response for flow eroute_connection
delete included errno 2: No such file or directory
Interestingly, I grepped my log for XFRM_MSG_DELPOLICY and it was not
there yesterday, even though the symptoms today are identical to that of
yesterday. Perhaps my removal and reinstallation of openswan did have
an effect.
I found this bug, but it doesn't quite match my situation:
https://bugs.xelerance.com/issues/1120
This thread seems to talk about a different situation, but has (what
seems to me) a common element of "not cleaning up":
http://archives.free.net.ph/message/20100221.103049.3c868bb7.ca.html
Following that example I increased debugging and got a snippet from what
happens when things lock up after a disconnect. Am I correct in
interpreting this to mean that openswan finds the policy and tries to
create a tunnel to match it?
==> /var/log/auth.log <==
Aug 13 11:25:18 yq-firewall pluto[14280]: |
Aug 13 11:25:18 yq-firewall pluto[14280]: | *received kernel message
Aug 13 11:25:18 yq-firewall pluto[14280]: | netlink_get:
XFRM_MSG_ACQUIRE message
Aug 13 11:25:18 yq-firewall pluto[14280]: | add bare shunt 0x9924ba0
199.247.238.35/32:1701 -17-> 76.9.58.207/32:1701 => %hold 0 %
acquire-netlink
Aug 13 11:25:18 yq-firewall pluto[14280]: | find_connection: looking for
policy for connection: 199.247.238.35:17/1701 -> 76.9.58.207:17/1701
Aug 13 11:25:18 yq-firewall pluto[14280]: | find_connection: concluding
with empty
Aug 13 11:25:18 yq-firewall pluto[14280]: initiate on demand from
199.247.238.35:1701 to 76.9.58.207:1701 proto=17 state: fos_start
because: acquire
Aug 13 11:25:18 yq-firewall pluto[14280]: Can not opportunistically
initiate for 199.247.238.35 to 76.9.58.207: no routed template covers
this pair
Aug 13 11:25:18 yq-firewall pluto[14280]: | no explicit failure shunt
for 199.247.238.35 to 76.9.58.207; installing %pass
Aug 13 11:25:18 yq-firewall pluto[14280]: | replacing broad host-to-host
bare shunt
Aug 13 11:25:18 yq-firewall pluto[14280]: | no routed template covers
this pair eroute 199.247.238.35/32:0 --17-> 76.9.58.207/32:0 => %pass
(raw_eroute)
Aug 13 11:25:18 yq-firewall pluto[14280]: | raw_eroute result=1
Aug 13 11:25:18 yq-firewall pluto[14280]: | add bare shunt 0x9924c20
199.247.238.35/32:0 -0-> 76.9.58.207/32:0 => %pass 0 no routed
template covers this pair
Aug 13 11:25:18 yq-firewall pluto[14280]: | adding specific host-to-host
bare shunt
Aug 13 11:25:18 yq-firewall pluto[14280]: | no routed template covers
this pair eroute 199.247.238.35/32:1701 --17-> 76.9.58.207/32:1701 => %
hold (raw_eroute)
Aug 13 11:25:18 yq-firewall pluto[14280]: | raw_eroute result=1
Aug 13 11:25:18 yq-firewall pluto[14280]: | delete bare shunt 0x9924ba0
199.247.238.35/32:1701 -17-> 76.9.58.207/32:1701 => %hold 0 %
acquire-netlink
Aug 13 11:25:18 yq-firewall pluto[14280]: | * processed 0 messages from
cryptographic helpers
Aug 13 11:25:18 yq-firewall pluto[14280]: | next event
EVENT_PENDING_DDNS in 28 seconds
Aug 13 11:25:18 yq-firewall pluto[14280]: | next event
EVENT_PENDING_DDNS in 28 seconds
While the connection is active and working, I have the following:
yq-firewall:~# ip xfrm state show
src 199.247.238.35 dst 76.9.58.207
proto esp spi 0x345f4a17 reqid 16401 mode transport
replay-window 32
auth hmac(md5) 0xce999843b7ceffac447ad262c2bb5361
enc cbc(des3_ede) 0xdf6968fa6f6a0c447188b6ac6768bc28e4a970bd0be0a446
encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
sel src 0.0.0.0/0 dst 0.0.0.0/0
src 76.9.58.207 dst 199.247.238.35
proto esp spi 0x6dc0948f reqid 16401 mode transport
replay-window 32
auth hmac(md5) 0xc8c0cd092baa9f1568b4adbc4712d458
enc cbc(des3_ede) 0xababbad6a67af43fe5314ae84d5c87060b9a389e2c532151
encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
sel src 0.0.0.0/0 dst 0.0.0.0/0
yq-firewall:~# ip xfrm policy show
src 199.247.238.35/32 dst 76.9.58.207/32 proto udp
dir out priority 2080
src 192.168.25.171/32 dst 199.247.238.35/32 proto udp dport 1701
dir in priority 2080
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16401 mode transport
src 199.247.238.35/32 dst 76.9.58.207/32 proto udp sport 1701
dir out priority 2080
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16401 mode transport
Once I disconnect, it looks like so:
yq-firewall:~# ip xfrm state show
yq-firewall:~# ip xfrm policy show
src 199.247.238.35/32 dst 76.9.58.207/32 proto udp
dir out priority 2080
src 199.247.238.35/32 dst 76.9.58.207/32 proto udp sport 1701
dir out priority 2080
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16401 mode transport
I think that the policy after disconnecting should be removed entirely.
I checked on another box, where the connection is used almost daily, and
there are no policies listed on there (and no active connections), and
nothing has been restarted or otherwise touched on that box for weeks.
I found another person who looks to have the same or similar problem,
but nothing was suggested to his request:
http://lists.openswan.org/pipermail/users/2010-June/018926.html
I expect I could work around this with an updown script of some flavour,
but I think I should also spend some time seeing if I can figure which
file is missing when the XFRM_MSG_DELPOLICY thing is attempted.
However, I am curious, is this something I did - a configuration error
somewhere, or is this a problem I have found? Or is this just a glitch
in the kernel or something and I should nuke the box and start over?
Now that I have done, undone, redone, and done over so many things, a
fresh box might not be a bad idea anyway....
Either way, I am a little smarter about IPSec and VPNs than I was, and
it can only get better :)
On Thu, 2010-08-12 at 18:25 -0700, Bob Miller wrote:
> Hello gurus;
>
> First, I want to express my appreciation to those of you who write the
> openswan software and help support it through this list and other
> documentation. Over the last days, it has been more than apparent to me
> all the good that you do for me and for others. I am inspired by your
> knowledge and your willingness to share it.
>
> Thank you.
>
> I am uncertain what I did to make this happen. Prior to my current
> stint with this box, I was using openswan 2.4.12 from debian repos on
> the server, and everything worked with XP without flaw for years. I now
> want to add a macbook and some vista laptops to the mix now and that
> spurred all the upgrading. As of now, I can reproduce this using XP and
> Vista, and I get almost the same thing with the Mac.
>
> In short, disconnecting an openswan tunnel seems to be locking something
> up, I am not quite certain what, yet. I can however reproduce reliably.
> Observe:
>
> Setup:
> openswan 2.6.28 using netkey, compiled from source on a stock debian
> lenny box, with l2tpns/freeradius handling things after the tunnel is
> up. XP client runs on a virtual machine (bridged adaptor so that it
> gets its own IP address) on my ubuntu workstation, which is behind an
> iptables firewall doing NAT.
>
> Procedure:
> On the server, start ipsec, l2tpns, and freeradius from fresh:
> /etc/init.d/l2tpns restart
> /etc/init.d/freeradius restart
> ipsec setup reload
>
> When in this fresh state, l2tpns will send a heartbeat twice per second,
> which I can log. So long as that hearbeat is going, things work as
> expected.
>
> next, connect using XP, no 3rd party client and no registry hack.
> Connection is established immediately, I can immediately ping a host on
> the remote LAN.
>
> All the logs at this point say everything is good and working. IPSec SA
> is established, and the l2tpns heartbeat carries on like clockwork.
>
> Once the connection disconnects, ~ 15 seconds later the l2tpns heartbeat
> stops and my auth.log shows the following every 30 seconds:
>
> Aug 12 13:10:59 yq-firewall pluto[10248]: initiate on demand from
> 199.247.238.35:1701 to 76.9.58.207:1701 proto=17 state: fos_start
> because: acquire
>
> 199.247.238.35 is the openswan box, 76.9.58.207 is the firewall I am
> behind.
>
> If I check ipsec auto --status immediately after I connect, everything
> seems normal. After a few minutes, I see the following line:
>
> 000 199.247.238.35/32:0 -0-> 76.9.58.207/32:0 => %pass 0 no routed
> template covers this pair
>
> after the connection has been dropped, that line shows up if it wasn't
> there before, or it does not go away if it was.
>
> >From here, if I do:
> ipsec setup reload
> the fos_start message stops, the "no routed template" message goes away,
> and my l2tpns hearbeat starts up again, and I can now make a connection
> again, to start the whole procedure over again.
>
> Conclusion:
> At first I thought this was an l2tpns problem, so I spent the first day
> trying to prove that to be the case. In that investigation, though,
> everything led me back to openswan. Now that I can reliably produce the
> issue, and reliably "fix" it by reloading ipsec, I am of the opinion
> that openswan (or rather my implementation of it) is the source of the
> problem and not l2tpns. If there is error in my conclusion, I would
> very much appreciate some correction.
>
> With the Mac, I get my IPSec SA established fine, then there are
> problems with authenticating against l2tpns, so it fails. Immediately
> after it fails, I get the fos_start message in my auth.log, which can
> again be fixed by reloading ipsec. It appears to be the same issue,
> except in this particular case, I never actually get a fully successful
> connection like I do with xp and vista. However, openswan behaves the
> same, as far as I can tell.
>
> For grins and giggles, I installed 2.6.27 over top of 2.6.28 and
> reloaded ispec again. It does not appear to have changed anything, the
> symptoms remain the same. Not sure what I expected to happen anyway.
>
> I have also spent considerable time looking for solutions via google,
> the closest thing I found was a thread on the openswan dev mailing list
> mentioning the "no routed template covers this pair" thing, but there
> was no solution.
>
> I would think that since I connect okay and can pass traffic across the
> tunnel, that my ipsec.conf and iptables are in order. Nonetheless I
> have tried using trial-and-error in the ipsec.conf file to correct the
> problem, but a few hours of that only provided me more problems. Here
> is my current config:
>
> version 2.0
>
> config setup
> interfaces="%defaultroute"
> plutodebug=none
> klipsdebug=none
> virtual_private=%v4:192.168.0.0/16,%v4:10.0.0.0/8,%v4:172.16.0.0/12,%
> v4:!192.168.138.0/24
> nat_traversal=yes
> protostack=netkey
> oe=off
>
> conn %default
> keyingtries=5
> leftcert=/etc/ipsec.d/certs/yq-firewall.yukonquest.com.pem
> leftid=@yq-firewall.yukonquest.com
> leftrsasigkey=%cert
> rightrsasigkey=%cert
> auto=add
>
> conn l2tp-yq
> left=199.247.238.35
> leftnexthop=199.247.238.1
> leftprotoport=17/1701
> right=%any
> rightprotoport=17/%any
> rightsubnet=vhost:%no,%priv
> pfs=no
>
> Might anyone be able to add some clues to my (hopefully not futile)
> investigation?
>
> Bob Miller
> 334-7117/660-5315
> http://computerisms.ca
> bob at computerisms.ca
> Network, Internet, Server,
> and Open Source Solutions
>
> _______________________________________________
> Users at openswan.org
> http://lists.openswan.org/mailman/listinfo/users
> Micropayments: https://flattr.com/thing/38387/IPsec-for-Linux-made-easy
> Building and Integrating Virtual Private Networks with Openswan:
> http://www.amazon.com/gp/product/1904811256/104-3099591-2946327?n=283155
Bob Miller
334-7117/660-5315
http://computerisms.ca
bob at computerisms.ca
Network, Internet, Server,
and Open Source Solutions
More information about the Users
mailing list