[Openswan dev] SPI generation by netlink_get_spi()

Thu Jul 29 16:04:53 CEST 2004

Hi Herbert,

one of my customers has a problem with Openswan/strongSwan running
on a 2.6.6 kernel connecting to freeswan-2.04 with X.509 patch 1.6.3.
I was able to re-enact the scenario in question on my 2.6.7 test platform.

The problem can occur if a connection is started by auto=start as
in the following example:

conn uwe
     keyingtries=0
     authby=rsasig
     left=%defaultroute
     leftcert=mypubkey.pem
     leftrsasigkey=%cert
     rightid="C=..."
     right=194.76.232.140
     rightsubnet=10.0.0.0/8
     auto=start

auto=start automatically installs a %trap eroute:

: | *received whack message
: | route owner of "uwe" unrouted: NULL; eroute owner: NULL
: | route owner of "uwe" unrouted: NULL; eroute owner: NULL
: | route_and_eroute with c: uwe (next: none) ero:null esr:{(nil)} ro:null
     rosr:{(nil)} and state: 0
: | add eroute 10.0.0.0/8:0 -> 145.254.54.68/32:0 => int.104 at 145.254.54.68:0
: | eroute_connection add eroute 145.254.54.68/32:0 -> 10.0.0.0/8:0 => %trap:0

next the conn uwe is initiated

: | Queuing pending Quick Mode with 194.76.232.140 "uwe"

: "uwe" #1: initiating Main Mode

starting with Main Mode, with a pending Quick Mode.

Due to a stray ICMP message occuring during the Main Mode Negotiation
the %trap eroute gets triggered and a narrow %hold eroute is installed:

: | *received kernel message
: | netlink_get: XFRM_MSG_ACQUIRE message
: | add bare shunt 0x8d854b0
      145.254.54.68/32:0 -> 10.128.9.1/32:0 => %hold:1 0    %acquire-netlink
: | initiate on demand from 145.254.54.68:0 to 10.128.9.1:0 proto=1 state:
     fos_start because: whack
: | find_connection: looking for policy for connection: 145.254.54.68:1/0 ->
      10.128.9.1:1/0

Next a search for a matching connection is started and und conn uwe is found:

: | find_connection: conn "uwe" has compatible peers:
      145.254.54.68/32->10.0.0.0/8 [pri: 16793612]
: | find_connection: comparing best "uwe" [pri:16793612]{0x8d82b68} (child none)
      to "uwe" [pri:16793612]{0x8d82b68} (child none)
: | find_connection: concluding with "uwe" [pri:16793612]{0x8d82b68}
     kind=CK_PERMANENT
: | eroute_connection replace %trap with broad %hold eroute 145.254.54.68/32:0
      -> 10.0.0.0/8:0 => %hold:0
: | delete narrow %hold eroute 145.254.54.68/32:0 -> 10.128.9.1/32:0 => %hold:1
: | delete bare shunt 0x8d854b0 145.254.54.68/32:0 -> 10.128.9.1/32:0 => %hold:1
     0    %acquire-netlink

Since a Main Mode negotiation for conn uwe is already under way, a Quick Mode
negotiation is queued in the pending queue.
Quick Mode

: | Queuing pending Quick Mode with 194.76.232.140 "uwe"

After the successful establishment of the phase 1 ISAKMP SA, the first
pending Quick Mode is started:

: "uwe" #1: ISAKMP SA established

: | unqueuing pending Quick Mode with 194.76.232.140 "uwe"
: | creating state object #2 at 0x8d86220
: "uwe" #2: initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP {using isakmp#1}
: |    message ID:  65 7e 1a 0f
: | netlink_get_spi: allocated 0x9f4c9788 for esp.0 at 145.254.54.68
: | SPI  9f 4c 97 88

The netlink interface of the 2.6 kernel is used to request an SPI for
the IPsec SA.

Immediately after the first Quick Mode message the second pending Quick Mode
is inititated:

: | unqueuing pending Quick Mode with 194.76.232.140 "uwe"
: | creating state object #3 at 0x8d876d0
: "uwe" #3: initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP {using isakmp#1}
: |    message ID:  a1 01 a2 b2
: | netlink_get_spi: allocated 0x9f4c9788 for esp.0 at 145.254.54.68
: | SPI  9f 4c 97 88

And here the error happens. The two Quick Mode negotiations have different
Message IDs (65 7e 1a 0f versus a1 01 a2 b2) which will cause two phase2
state objects to be created on the peer side but the generated SPI 9f 4c 97 88
is the same. This will trigger the assertion passert(0) in 
kernel_pfkey.c:finish_pfkey_msg() in freeswan-2.0x because twice the same 
SADB_ADD command is executed for the outbound esp. Removing the assertion
as in Openswan does not help - several retrials will not succeed in setting
up the IPsec SA.

Looking at kernel.c:get_spi() I see that if KLIPS is used, each call
increases the SPI by one (spi++) so that always a unique SPI is generated
and therefore the problem never occurs. But using the native IPsec stack
of the 2.6 kernel causes netlink_get_spi() to be called instead:

static ipsec_spi_t
netlink_get_spi(const ip_address *src
               , const ip_address *dst
               , int proto
               , bool tunnel_mode
               , unsigned reqid
               , ipsec_spi_t min
               , ipsec_spi_t max
               , const char *text_said)

Because all input parameters are the same (I suspect that both Quick
Modes also use the same reqid, although I couldn't document this yet)
I assume this causes netlink to return the same SPI.

How can this be fixed? Can netlink be forced to generate unique
SPIs by some other means or must the reqids be different?

Regards

Andreas

=======================================================================
Andreas Steffen                   e-mail: andreas.steffen at strongsec.com
strongSec GmbH                    home:   http://www.strongsec.com
Alter Zürichweg 20                phone:  +41 1 730 80 64
CH-8952 Schlieren (Switzerland)   fax:    +41 1 730 80 65
==========================================[strong internet security]===