[Openswan Users] Big packets from OAKLEY don't get through (problably MTU issue?)

Fri Aug 26 16:09:34 CEST 2005

Hi folks,

a strange thing happened here with my first L2TP setup. Here's my 
current setup:

--- ipsec.conf

version 2.0
config setup
     nat_traversal=yes
     virtual_private=%v4:10.0.0.0/8,%v4:172.16.0.0/12,%v4:192.168.0.0/16
     plutodebug=all
     overridemtu=1500

conn %default
    keyingtries=1
    disablearrivalcheck=no

[...]
conn block
   auto=ignore
conn private
   auto=ignore
conn private-or-clear
   auto=ignore
conn clear-or-private
   auto=ignore
conn clear
   auto=ignore
conn packetdefault
   auto=ignore

[...]

conn L2TP-conn-old
  keyingtries=3
  authby=rsasig
  left=%defaultroute
  leftcert=mycert.pem
  leftprotoport=17/0
  right=%any
  rightprotoport=17/1701
  rightsubnet=vhost:%no,%priv
  rightca=%same
  pfs=no
  auto=add
  compress=yes  

--- l2tpd.conf upon request (didn't seem to be important on this issue 
from my side)

Running a freeswan 2.04 with x509-1.7.0 patch. Just for reference: eth1 
is the device where the internet is hanging on. The linux box (the VPN 
server) is directly connected to the internet without any NAT, however 
does NAT on itself (the VPN connection thereof isn't involved). There is 
an iptables firewall on this maschine among which the following rules 
are specified:

VYPEETH=eth1
/usr/local/sbin/iptables -A INPUT -j ACCEPT -p 50
/usr/local/sbin/iptables -A FORWARD -j ACCEPT -p 50
/usr/local/sbin/iptables -A OUTPUT -j ACCEPT -p 50
/usr/local/sbin/iptables -t nat -A POSTROUTING -j ACCEPT -p 50

/usr/local/sbin/iptables -A INPUT -j ACCEPT -p 51
/usr/local/sbin/iptables -A FORWARD -j ACCEPT -p 51
/usr/local/sbin/iptables -A OUTPUT -j ACCEPT -p 51
/usr/local/sbin/iptables -t nat -A POSTROUTING -j ACCEPT -p 51

/usr/local/sbin/iptables -A INPUT -j ACCEPT -s ! 192.168.0.0/16 -i 
$VYPEETH -p udp --dport 500
/usr/local/sbin/iptables -A INPUT -j ACCEPT -s ! 192.168.0.0/16 -i 
$VYPEETH -p udp --dport 4500

The Windows 2000 client isn't NATed either (directly connected to the 
local ISP via a 28.8k modem link - this is just for testing purposes!).
So far about the environment where my problem takes place.

What happened then?

At first, everything went fine. I got the VPN connection, the L2TP 
server responded correctly, authentification was fine and I got an 
internal IP. Connection then was fine, as expected. I transfered several 
testing files and that one worked as expected.
However, a day later (I downed both systems and restarted them after 
approx 8 hours) no VPN connection was established anymore. Now, the 
"last normal words" of the VPN server are:

Aug 26 14:07:11 boss pluto[13526]: packet from 217.247.167.90:500: 
ignoring Vendor ID payload [MS NT5 ISAKMPOAKLEY 00000002]
Aug 26 14:07:11 boss pluto[13526]: "L2TP-conn-old"[1] 217.247.167.90 #1: 
responding to Main Mode from unknown peer 217.247.167.90

(217.247.167.90 is the IP address of the dial-up modem line of the 
Windows Box). After a while, pluto gets timed-out and says:

Aug 26 14:10:04 boss pluto[14133]: "L2TP-conn-old"[1] 217.247.167.90 #1: 
next payload type of ISAKMP Hash Payload has an unknown value: 138
Aug 26 14:10:04 boss pluto[14133]: "L2TP-conn-old"[1] 217.247.167.90 #1: 
malformed payload in packet
Aug 26 14:10:21 boss pluto[14133]: "L2TP-conn-old"[1] 217.247.167.90 #1: 
Informational Exchange message must be encrypted
Aug 26 14:11:01 boss pluto[14133]: "L2TP-conn-old"[1] 217.247.167.90 #1: 
max number of retransmissions (2) reached STATE_MAIN_R2
Aug 26 14:11:01 boss pluto[14133]: "L2TP-conn-old"[1] 217.247.167.90: 
deleting connection "L2TP-conn-old" instance with peer 217.247.167.90 
{isakmp=#0/ipsec=#0}

If I let a plutodebug=all run through it, it turns out as follows:

Aug 26 14:19:01 boss pluto[14690]: | sending 316 bytes for STATE_MAIN_R1 
through eth1 to 217.247.179.103:500:
Aug 26 14:19:01 boss pluto[14690]: |   4b ec f2 5a  66 a6 6b 8b  04 e8 
65 52  a6 89 93 bf
[...]
Aug 26 14:19:01 boss pluto[14690]: |   6f 92 9d e7  00 00 00 05  04 00 00 00
Aug 26 14:19:01 boss pluto[14690]: | inserting event EVENT_RETRANSMIT, 
timeout in 10 seconds for #1
Aug 26 14:19:01 boss pluto[14690]: | next event EVENT_RETRANSMIT in 10 
seconds for #1
Aug 26 14:19:11 boss pluto[14690]: | 
Aug 26 14:19:11 boss pluto[14690]: | *time to handle event
Aug 26 14:19:11 boss pluto[14690]: | event after this is 
EVENT_SHUNT_SCAN in 106 seconds
Aug 26 14:19:11 boss pluto[14690]: | handling event EVENT_RETRANSMIT for 
217.247.179.103 "L2TP-conn-old" #1
Aug 26 14:19:11 boss pluto[14690]: | sending 316 bytes for 
EVENT_RETRANSMIT through eth1 to 217.247.179.103:500:
Aug 26 14:19:11 boss pluto[14690]: |   4b ec f2 5a  66 a6 6b 8b  04 e8 
65 52  a6 89 93 bf
[... same as above ...]
Aug 26 14:19:11 boss pluto[14690]: |   6f 92 9d e7  00 00 00 05  04 00 00 00
Aug 26 14:19:11 boss pluto[14690]: | inserting event EVENT_RETRANSMIT, 
timeout in 20 seconds for #1
Aug 26 14:19:11 boss pluto[14690]: | next event EVENT_RETRANSMIT in 20 
seconds for #1

As you see, the STATE_MAIN_R1 packet does get communicated (I already 
could observe this with a packet sniffer at the outbound side of the 
box), but never gets answered by the Windows Client. The corresponding 
oakley.log on the Windows Client looks as follows:

[BTW: Don't get confused with the time values: The clock of the Windows 
Client is a bit off!]

At first it recieves encrypted parameters just fine:
 8-26: 12:07:41:84 processing payload SA 
 8-26: 12:07:41:84 Received Phase 1 Transform 1
 8-26: 12:07:41:84      Encryption Alg Dreifach-DES CBC(5)
 8-26: 12:07:41:84      Hash Alg SHA(2)
 8-26: 12:07:41:84      Oakley Group 14
 8-26: 12:07:41:84      Auth Method RSA-Signatur mit Zertifikaten(3)
 8-26: 12:07:41:84      Life type in Seconds
 8-26: 12:07:41:84      Life duration of 28800
 8-26: 12:07:41:84 Phase 1 SA accepted: transform=1

and later on it reads:

 8-26: 14:21:52:694 Resume: (get) SA = 0x00238d78 from 217.247.167.90
 8-26: 14:21:52:694 ISAKMP Header: (V1.0), len = 316
 8-26: 14:21:52:694   I-COOKIE 4becf25a66a66b8b
 8-26: 14:21:52:694   R-COOKIE 04e86552a68993bf
 8-26: 14:21:52:694   exchange: Oakley Main Mode
 8-26: 14:21:52:694   flags: 0
 8-26: 14:21:52:694   next payload: KE
 8-26: 14:21:52:694   message ID: 00000000

As you can see, the box hears the request from the Linux Server. And 
after some processing ...

 8-26: 14:21:52:694 Stopping RetransTimer sa:00238D78 centry:00000000 
handle:0011E850
 8-26: 14:21:52:694 processing payload KE 
 8-26: 14:21:52:694 Generated 256 byte Shared Secret
[...]
 8-26: 14:21:52:694 constructing CERT
 8-26: 14:21:52:694 constructing SIG
 8-26: 14:21:52:694 Construct SIG
[...]
 8-26: 14:21:52:694 Sending: SA = 0x00238D78 to 217.247.167.90
 8-26: 14:21:52:694 ISAKMP Header: (V1.0), len = 1956
 8-26: 14:21:52:694   I-COOKIE 4becf25a66a66b8b
 8-26: 14:21:52:694   R-COOKIE 04e86552a68993bf
 8-26: 14:21:52:694   exchange: Oakley Main Mode
 8-26: 14:21:52:694   flags: 1 ( encrypted )
 8-26: 14:21:52:694   next payload: ID
 8-26: 14:21:52:694   message ID: 00000000
 8-26: 14:21:53:694 Handling Retransmit: sa 238d78 handle 11e850 context 
2394a8 arg 2394a8

... it tries to send a huge ISAKMP package (len=1956). Please note, that 
the Linux Box never gets this package! I already used the ethereal 
packet sniffer to see, if at least *something* gets in (perhaps another 
firewalled port or so), but there simply is no reply at all!
I browsed through the web and found some request on several 
mailinglists, which deal with quite the same issue as described above. 
Everything either pattered out or was told "that's too much for your 
MTU, the IP packets get fragmented, therefore decrease your certificates 
size, look at your overridemtu= setting". However, at this point, two 
major issues arise:

   1. Why did it then work the first time? Please note, that I did not
      change anything regarding the certificates (oh, BTW: I already
      checked the validity of the certificates; they are created two
      days ago and are valid until 2010).
   2. How can I either decrease the size of my certificate by more than
      400 bytes or deal with IP fragmented packets on the Linux Box (the
      latter solution would be the prefered one for me).

Regarding my idea dealing with fragemented packets, I also browsed 
through the web and only found the ancient CONFIG_ALWAYS_DEFRAGMENT 
config switch of the Linux Kernel in mid 2.2.x. However, my current 
2.4.31 linux kernel does not provide this switch anymore. There isn't a 
/proc/sys/net/ipv4/ip_frag file, too. So, what the hack is going on? 
Normally fragmented packets at least get shown up on ethereal, don't they?

Before someone asks: Yes, there is a firewall on the Windows Box, but, 
no, it's not related to this issue. If you entirely switch it off, 
you'll get the same result as described above.

So, can anyone help me what's going wrong here?
Thanks in advance!

Greetings from Germany
    Nico

Futher related postings on this issue can be found at:
* http://lists.virus.org/users-openswan-0411/msg00166.html
* http://lists.openswan.org/pipermail/users/2004-November/002937.html
* http://www2.frell.ambush.de/archives/freeswan-users/6329.html
* http://www.jacco2.dds.nl/networking/freeswan-l2tp.html (section 16.3)
* http://www.sandelman.ottawa.on.ca/ipsec/1999/03/msg00074.html
* http://www.wlug.org.nz/IPSecConfiguration (yes, I checked that I 
imported the key to the "local computer" and not to the "current user")