[Openswan dev] ARM unaligned bug

Tue Jan 19 13:03:28 EST 2010

Hi!

I'm running openswan-2.6.24 and ran into a problem which I think may
be a corner case bug.
The bug only appears if I have nat-t enabled but there is no NAT
between the initiator and the responder.

What happens is that the responder gets stuck in an eternal loop
eating 100% cpu. The last lines from the log file are:

pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[Dead Peer Detection]
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[RFC 3947] method set to=109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-03] meth=108, but already using method 109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-02_n] meth=106, but already using method
109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-02] meth=107, but already using method 109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-00]
pluto[561]: "ipsec1"[1] 88.88.88.88 #1: Aggressive mode peer ID is
ID_FQDN: '@init'
pluto[561]: "ipsec1"[1] 88.88.88.88 #1: responding to Aggressive Mode,
state #1, connection "ipsec1" from 88.88.88.88
pluto[561]: "ipsec1"[1] 88.88.88.88 #1: enabling possible
NAT-traversal with method 4

Then it gets stuck.

Here is my ipsec.conf on the responder:

version 2
config setup
   nat_traversal=yes
   overridemtu=1419
   plutorestartoncrash=yes
   protostack=netkey
   plutodebug=all

conn ipsec1
   left=88.88.88.1
   leftsubnet=192.168.0.0/24
   leftsourceip=192.168.0.200
   leftid=@arg
   right=%any
   rightsubnet=192.168.2.0/24
   rightid=@init
   aggrmode=yes
   ike=AES128-SHA1-modp1024
   esp=AES128-SHA1
   type=tunnel
   pfs=yes
   authby=secret
   keyingtries=%forever
   auto=add
   dpdaction=clear
   dpddelay=30
   dpdtimeout=120


Now what is more interesting is that the exact same source code
compiled on my intel Linux PC works. But it crashes when I run it on
an ARM Linux target. I have two ARM targets, one big endian (IXP4xx)
and one little endian (i.MX27). Openswan crashes on both, but not on
the intel PC. It got stuck running a lot of calls to ntohs() without
delay.

So I compiled in debug information and traced it with GDB to the
function out_modify_previous_np() in lib/libpluto/packet.c. The inner
loop (begins on line 1675) looks like this:

...
struct isakmp_generic *hdr;
for (offset = sizeof(struct isakmp_hdr); offset < len ;
	offset += ntohs(hdr->isag_length)) {
	if ((len - offset) < sizeof(struct isakmp_generic))
		return FALSE;
	hdr = (struct isakmp_generic *)(outs->start+offset);
	if ((len - offset) < ntohs(hdr->isag_length))
		return FALSE;
	if ((len - offset) == ntohs(hdr->isag_length)) {
		hdr->isag_np = np;
		return TRUE;
	}
}
...

I couldn't beleive my eyes, but the problem was hdr->isag_length. When
(outs->start+offset) is on a non-aligned address ARM runs into trouble
and for some reason returns 0 instead of the isag_length value.
Because of that the for loop never terminates.

After reading this:

http://infocenter.arm.com/help/topic/com.arm.doc.faqs/ka3544.html

My fix was to tell the compiler that struct isakmp_generic might point
to a non aligned address:

diff -urN openswan-2.6.24-orig/include/packet.h openswan-2.6.24/include/packet.h

--- openswan-2.6.24-orig/include/packet.h	2010-01-10 02:34:38.000000000 +0100
+++ openswan-2.6.24/include/packet.h	2010-01-19 11:30:18.000000000 +0100
@@ -195,7 +195,7 @@
     u_int8_t    isag_np;
     u_int8_t    isag_reserved;
     u_int16_t   isag_length;
-};
+} __attribute__((__packed__));

 extern struct_desc isakmp_generic_desc;

And that solved the problem.

I don't know if this is the proper solution, but at least it solved my
problem. If you find it useful you can put a proper fix into the
upstream sources.

Note that the compiler mentioned on arm infocenter uses the __packed
syntax while GCC uses the syntax __attribute__((__packed__)).


Best regards,

Albert