[Openswan Users] Corner cases with DPD
Michael Smith
msmith at cbnco.com
Mon Jul 12 12:57:32 CEST 2004
Hi,
I'm running Openswan 1.0.6 on a central site with a number of road warrior
clients. I'm using dead peer detection mainly so the clients can figure
out if the central site has rebooted and start new tunnels. With about a
dozen test clients, I'm seeing about one site per day that goes down and
doesn't come back up because of something related to DPD not doing its
job. I think I've narrowed it down to a couple of cases.
The clients are behind ADSL routers doing NAT, so I am using NAT-T. I saw
a post from March saying DPD + NAT-T may cause problems, but I think
the problems I'm seeing are not related to that. Most of the clients are
actually Openswan 1.0.1 but according to CHANGES nothing DPD-related has
happened since then.
Case #1: client has invalid ISAKMP SA but doesn't know it; tries to
negotiate IPsec SA. Here's how it happened:
08:29:40: client negotiates main mode ISAKMP SA #173
08:29:47: client negotiates IPsec SA #174
09:17:57: client negotiates ISAKMP #175 to replace #173
10:01:45: client negotiates ISAKMP #176 to replace #175
At this point there are two ISAKMP SAs (one old, one new) and two IPsec
SAs. I restarted ipsec on the central site. On the way down it deleted all
three SAs and sent notifications:
Jul 12 10:03:49 pluto[15412]: "test-adsl-1"[36]
w.x.y.z:4500: deleting connection "test-adsl-1" instance with peer w.x.y.z
Jul 12 10:03:49 pluto[15412]: "test-adsl-1" #2052: deleting
state (STATE_QUICK_R2)
Jul 12 10:03:49 pluto[15412]: "test-adsl-1" #2068: deleting
state (STATE_MAIN_R3)
Jul 12 10:03:49 pluto[15412]: "test-adsl-1" #2083: deleting
state (STATE_MAIN_R3)
But it looks like the client only got the delete notifications for the
IPsec SA and the old ISAKMP SA:
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #176:
received Delete SA payload: deleting IPSEC State #174
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #176:
received and ignored informational message
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #175:
received Delete SA payload: deleting ISAKMP State #175
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: packet from
z.y.x.w:4500: received and ignored informational message
That left the new ISAKMP SA, #176, alive, and the client began using it to
try to make a new IPsec SA. Meanwhile the central site restarted and
started receiving the client's quick mode proposals on an ISAKMP SA it
knew nothing about.
Jul 12 10:03:57 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #177:
initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS
...
Jul 12 10:39:16 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #206:
max number of retransmissions (2) reached STATE_QUICK_I1. No
acceptable response to our first Quick Mode message: perhaps peer likes
no proposal
Jul 12 10:39:16 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #206:
starting keying attempt 31 of an unlimited number
Jul 12 10:39:16 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #207:
initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS to replace #206
I have keyingtries=0, so this continued until the ISAKMP was replaced an
hour or so later. I think DPD would have figured out the ISAKMP SA was
invalid but I'm guessing DPD doesn't run until there is an IPsec SA. This
case shouldn't happen very often, but I don't think there's a workaround.
Case #2: client has valid (new) ISAKMP SA but invalid IPsec SA
04:28:07: client negotiates main mode ISAKMP SA #73
04:28:17: client negotiates IPsec SA #74
05:12:50: client's ADSL connection goes down
05:14:58: central site DPD declares the client dead and deletes its ISAKMP
and IPsec SAs. Client doesn't get delete notification because it's
still down. Client DPD doesn't declare peer dead.
05:15:11: client's ADSL line comes back up
05:15:35: client initiates main mode (#75) to replace #73 which is about
to expire
The client now has a valid ISAKMP SA and thinks its IPsec SA is still
valid for another seven hours. The central site agrees about ISAKMP but
not IPsec. This happens fairly often, but I think I can work around it by
setting a very high dpdtimeout (86400 seconds) on the central site.
Mike
More information about the Users
mailing list