[Openswan Users] Corner cases with DPD

Michael Smith msmith at cbnco.com
Mon Jul 12 12:57:32 CEST 2004


Hi,

I'm running Openswan 1.0.6 on a central site with a number of road warrior
clients. I'm using dead peer detection mainly so the clients can figure
out if the central site has rebooted and start new tunnels. With about a
dozen test clients, I'm seeing about one site per day that goes down and
doesn't come back up because of something related to DPD not doing its
job. I think I've narrowed it down to a couple of cases.

The clients are behind ADSL routers doing NAT, so I am using NAT-T. I saw
a post from March saying DPD + NAT-T may cause problems, but I think
the problems I'm seeing are not related to that. Most of the clients are
actually Openswan 1.0.1 but according to CHANGES nothing DPD-related has
happened since then.

Case #1: client has invalid ISAKMP SA but doesn't know it; tries to
negotiate IPsec SA. Here's how it happened:

08:29:40: client negotiates main mode ISAKMP SA #173
08:29:47: client negotiates IPsec SA #174
09:17:57: client negotiates ISAKMP #175 to replace #173
10:01:45: client negotiates ISAKMP #176 to replace #175

At this point there are two ISAKMP SAs (one old, one new) and two IPsec
SAs. I restarted ipsec on the central site. On the way down it deleted all
three SAs and sent notifications:

Jul 12 10:03:49 pluto[15412]: "test-adsl-1"[36]
   w.x.y.z:4500: deleting connection "test-adsl-1" instance with peer w.x.y.z
Jul 12 10:03:49 pluto[15412]: "test-adsl-1" #2052: deleting
   state (STATE_QUICK_R2)
Jul 12 10:03:49 pluto[15412]: "test-adsl-1" #2068: deleting
   state (STATE_MAIN_R3)
Jul 12 10:03:49 pluto[15412]: "test-adsl-1" #2083: deleting
   state (STATE_MAIN_R3)

But it looks like the client only got the delete notifications for the
IPsec SA and the old ISAKMP SA:

Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #176:
   received Delete SA payload: deleting IPSEC State #174
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #176:
   received and ignored informational message
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #175:
   received Delete SA payload: deleting ISAKMP State #175
Jul 12 14:03:55 test-adsl-1 authpriv.warn pluto[311]: packet from
   z.y.x.w:4500: received and ignored informational message

That left the new ISAKMP SA, #176, alive, and the client began using it to
try to make a new IPsec SA. Meanwhile the central site restarted and
started receiving the client's quick mode proposals on an ISAKMP SA it
knew nothing about.

Jul 12 10:03:57 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #177:
   initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS
...
Jul 12 10:39:16 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #206:
   max number of retransmissions (2) reached STATE_QUICK_I1.  No
   acceptable response to our first Quick Mode message: perhaps peer likes
   no proposal
Jul 12 10:39:16 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #206:
   starting keying attempt 31 of an unlimited number
Jul 12 10:39:16 test-adsl-1 authpriv.warn pluto[311]: "test-adsl-1" #207:
   initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS to replace #206

I have keyingtries=0, so this continued until the ISAKMP was replaced an
hour or so later. I think DPD would have figured out the ISAKMP SA was
invalid but I'm guessing DPD doesn't run until there is an IPsec SA. This
case shouldn't happen very often, but I don't think there's a workaround.


Case #2: client has valid (new) ISAKMP SA but invalid IPsec SA

04:28:07: client negotiates main mode ISAKMP SA #73
04:28:17: client negotiates IPsec SA #74
05:12:50: client's ADSL connection goes down
05:14:58: central site DPD declares the client dead and deletes its ISAKMP
   and IPsec SAs. Client doesn't get delete notification because it's
   still down. Client DPD doesn't declare peer dead.
05:15:11: client's ADSL line comes back up
05:15:35: client initiates main mode (#75) to replace #73 which is about
   to expire

The client now has a valid ISAKMP SA and thinks its IPsec SA is still
valid for another seven hours. The central site agrees about ISAKMP but
not IPsec. This happens fairly often, but I think I can work around it by
setting a very high dpdtimeout (86400 seconds) on the central site.

Mike


More information about the Users mailing list