[Openswan dev] ARM unaligned bug
Albert Veli
albert.veli at gmail.com
Tue Jan 19 13:03:28 EST 2010
Hi!
I'm running openswan-2.6.24 and ran into a problem which I think may
be a corner case bug.
The bug only appears if I have nat-t enabled but there is no NAT
between the initiator and the responder.
What happens is that the responder gets stuck in an eternal loop
eating 100% cpu. The last lines from the log file are:
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[Dead Peer Detection]
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[RFC 3947] method set to=109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-03] meth=108, but already using method 109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-02_n] meth=106, but already using method
109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-02] meth=107, but already using method 109
pluto[561]: packet from 88.88.88.88:500: received Vendor ID payload
[draft-ietf-ipsec-nat-t-ike-00]
pluto[561]: "ipsec1"[1] 88.88.88.88 #1: Aggressive mode peer ID is
ID_FQDN: '@init'
pluto[561]: "ipsec1"[1] 88.88.88.88 #1: responding to Aggressive Mode,
state #1, connection "ipsec1" from 88.88.88.88
pluto[561]: "ipsec1"[1] 88.88.88.88 #1: enabling possible
NAT-traversal with method 4
Then it gets stuck.
Here is my ipsec.conf on the responder:
version 2
config setup
nat_traversal=yes
overridemtu=1419
plutorestartoncrash=yes
protostack=netkey
plutodebug=all
conn ipsec1
left=88.88.88.1
leftsubnet=192.168.0.0/24
leftsourceip=192.168.0.200
leftid=@arg
right=%any
rightsubnet=192.168.2.0/24
rightid=@init
aggrmode=yes
ike=AES128-SHA1-modp1024
esp=AES128-SHA1
type=tunnel
pfs=yes
authby=secret
keyingtries=%forever
auto=add
dpdaction=clear
dpddelay=30
dpdtimeout=120
Now what is more interesting is that the exact same source code
compiled on my intel Linux PC works. But it crashes when I run it on
an ARM Linux target. I have two ARM targets, one big endian (IXP4xx)
and one little endian (i.MX27). Openswan crashes on both, but not on
the intel PC. It got stuck running a lot of calls to ntohs() without
delay.
So I compiled in debug information and traced it with GDB to the
function out_modify_previous_np() in lib/libpluto/packet.c. The inner
loop (begins on line 1675) looks like this:
...
struct isakmp_generic *hdr;
for (offset = sizeof(struct isakmp_hdr); offset < len ;
offset += ntohs(hdr->isag_length)) {
if ((len - offset) < sizeof(struct isakmp_generic))
return FALSE;
hdr = (struct isakmp_generic *)(outs->start+offset);
if ((len - offset) < ntohs(hdr->isag_length))
return FALSE;
if ((len - offset) == ntohs(hdr->isag_length)) {
hdr->isag_np = np;
return TRUE;
}
}
...
I couldn't beleive my eyes, but the problem was hdr->isag_length. When
(outs->start+offset) is on a non-aligned address ARM runs into trouble
and for some reason returns 0 instead of the isag_length value.
Because of that the for loop never terminates.
After reading this:
http://infocenter.arm.com/help/topic/com.arm.doc.faqs/ka3544.html
My fix was to tell the compiler that struct isakmp_generic might point
to a non aligned address:
diff -urN openswan-2.6.24-orig/include/packet.h openswan-2.6.24/include/packet.h
--- openswan-2.6.24-orig/include/packet.h 2010-01-10 02:34:38.000000000 +0100
+++ openswan-2.6.24/include/packet.h 2010-01-19 11:30:18.000000000 +0100
@@ -195,7 +195,7 @@
u_int8_t isag_np;
u_int8_t isag_reserved;
u_int16_t isag_length;
-};
+} __attribute__((__packed__));
extern struct_desc isakmp_generic_desc;
And that solved the problem.
I don't know if this is the proper solution, but at least it solved my
problem. If you find it useful you can put a proper fix into the
upstream sources.
Note that the compiler mentioned on arm infocenter uses the __packed
syntax while GCC uses the syntax __attribute__((__packed__)).
Best regards,
Albert
More information about the Dev
mailing list