[Openswan Users] Pluto segfault on openswan-2.6.23

Giovani Moda giovani at mrinformatica.com.br
Tue Oct 6 18:50:11 EDT 2009


> That might require a restart of daemons that inherit the ulimit. Your
> current shell prob also had it set. If you want to be safe, reboot.

Rebooted several times. Still no go.

> Yes, you can attach to pluto as well, though I can't see why it would
> crash but not core dump if kill -6 works.

Could it be because it's not the "main" process that gets a segfault?
When I search for pluto pids, I get three processes running:

1) /bin/sh /usr/local/lib/ipsec/_plutorun --debug all raw crypt parsing
emitting control lifecycle klips dns oppo controlmore x509 pfkey
nattraversal --uniqueids yes --force_busy no --nocrsend no
--strictcrlpolicy no --nat_traversal yes --keep_alive  --protostack
klips --force_keepalive no --disable_port_floating no --virtual_private
%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12 --crlcheckinterval 0
--ocspuri  --nhelpers  --dump /var/run/pluto --opts  --stderrlog  --wait
no --pre  --post  --log daemon.error --plutorestartoncrash false --pid
/var/run/pluto/pluto.pid

2) /bin/sh /usr/local/lib/ipsec/_plutorun --debug all raw crypt parsing
emitting control lifecycle klips dns oppo controlmore x509 pfkey
nattraversal --uniqueids yes --force_busy no --nocrsend no
--strictcrlpolicy no --nat_traversal yes --keep_alive  --protostack
klips --force_keepalive no --disable_port_floating no --virtual_private
%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12 --crlcheckinterval 0
--ocspuri  --nhelpers  --dump /var/run/pluto --opts  --stderrlog  --wait
no --pre  --post  --log daemon.error --plutorestartoncrash false --pid
/var/run/pluto/pluto.pid

3) /usr/local/libexec/ipsec/pluto --nofork --secretsfile
/etc/ipsec.secrets --ipsecdir /etc/ipsec.d --debug-all --debug-raw
--debug-crypt --debug-parsing --debug-emitting --debug-control
--debug-lifecycle --debug-klips --debug-dns --debug-oppo
--debug-controlmore --debug-x509 --debug-pfkey --debug-nattraversal
--use-klips --uniqueids --nat_traversal --virtual_private
%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12

The one crashing is always number 3. Also, I've noticed that this only
happens when NAT-T is involved. A L2TP/IPSEC tunnel on the same subnet
gets no crash.

> then crash it, and you should be in gdb for the "bt full" backtrace.

I can't get no backtrace at all. I've installed debug libraries for
glibc, ncurses, gmp, gcc and kernel, so gdb could read all necessary
symbols, attached to the pid of processes 3 (as stated above), crashed,
and got " No stack." on "bt full". Here is the output:

[root at inet pluto]# gdb -p 12661
GNU gdb Fedora (6.8-24.fc9)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
Attaching to process 12661
Reading symbols from /usr/local/libexec/ipsec/pluto...done.
Reading symbols from /lib/libcrypt.so.1...Reading symbols from
/usr/lib/debug/lib/libcrypt-2.8.so.debug...done.
done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/sse2/libgmp.so.3...Reading symbols from
/usr/lib/debug/usr/lib/sse2/libgmp.so.3.4.2.debug...done.
done.
Loaded symbols for /usr/lib/sse2/libgmp.so.3
Reading symbols from /lib/libc.so.6...Reading symbols from
/usr/lib/debug/lib/libc-2.8.so.debug...done.
done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...Reading symbols from
/usr/lib/debug/lib/ld-2.8.so.debug...done.
done.
Loaded symbols for /lib/ld-linux.so.2
__kernel_vsyscall () at arch/x86/vdso/vdso32/int80.S:16
16              ret
(gdb) cont
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
Current language:  auto; currently asm
(gdb) bt full
No stack.
(gdb)

This happens to any of the three pids I get from pluto. I've tried to do
the same with CentOS 5.3 and Fedora 7, but both of them hangs when ipsec
crashes the kernel, so I can't even get a core dump or gdb the pluto
process. Whatever is the reason for the crash, Fedora 9, maybe because
of the newer kernel, can bounce back, but not CentOS or FC7.

I don't know if it helps, but here is the Oops:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<f917b32d>] :ipsec:aes_32+0x3/0x496
*pde = 00000000
Oops: 0002 [#1] SMP
Modules linked in: ipsec ccm aead serpent blowfish twofish
twofish_common ecb xcbc cbc crypto_blkcipher sha256_generic
sha512_generic des_generic aes_i586 aes_generic bridge stp bnep rfcomm
l2cap bluetooth sunrpc ipv6 dm_mirror dm_log dm_multipath scsi_dh dm_mod
i915 usb_storage 8139too drm skge 8139cp mii sr_mod i2c_i801 iTCO_wdt
i2c_algo_bit cdrom iTCO_vendor_support pcspkr i2c_core sg pata_acpi
ata_generic ata_piix libata sd_mod scsi_mod crc_t10dif ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: ipsec]

Pid: 11691, comm: pluto Not tainted (2.6.27.35-79.2.56.fc9_mr.i686 #1)
EMAX 945GC-M2
EIP: 0060:[<f917b32d>] EFLAGS: 00010202 CPU: 1
EIP is at aes_32+0x3/0x496 [ipsec]
EAX: f5677400 EBX: 00000208 ECX: 00000004 EDX: 00000000
ESI: f5674400 EDI: f5677608 EBP: f5689b28 ESP: f5689b14
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process pluto (pid: 11691, ti=f5689000 task=f572bf70 task.ti=f5689000)
Stack: f5677608 f5674400 00000208 fffffff4 f5689b44 f5689b38 00000202
f9179bbf
       00000000 f5689b40 f91798f6 f5689b64 f9176968 00000010 f5689b60
f5677400
       f91a5144 00000003 f5674400 f916db75 f5689c5c f915bd3e 00002000
00000000
Call Trace:
 [<f9179bbf>] ? AES_set_key+0xa/0x2b [ipsec]
 [<f91798f6>] ? _aes_set_key+0xf/0x19 [ipsec]
 [<f9176968>] ? ipsec_alg_enc_key_create+0x1cf/0x284 [ipsec]
 [<f916db75>] ? pfkey_key_process+0x0/0x19f [ipsec]
 [<f915bd3e>] ? ipsec_sa_init+0x4ee/0x8c5 [ipsec]
 [<c049c211>] ? do_select+0x492/0x4bb
 [<c041fa1f>] ? update_curr+0x94/0xdc
 [<c061c4c5>] ? fn_hash_lookup+0x38/0x87
 [<c06186ba>] ? __inet_dev_addr_type+0x70/0xa7
 [<f916db75>] ? pfkey_key_process+0x0/0x19f [ipsec]
 [<f916aec1>] ? pfkey_add_parse+0x1c2/0x6eb [ipsec]
 [<c043d4c0>] ? prepare_to_wait_exclusive+0x51/0x58
 [<f9170884>] ? pfkey_msg_parse+0x466/0x5fe [ipsec]
 [<f916dc82>] ? pfkey_key_process+0x10d/0x19f [ipsec]
 [<f916db75>] ? pfkey_key_process+0x0/0x19f [ipsec]
 [<f9168dda>] ? pfkey_msg_interp+0x236/0x29c [ipsec]
 [<f916895e>] ? pfkey_sendmsg+0x2b1/0x3bf [ipsec]
 [<c05cd968>] ? __sock_sendmsg+0x45/0x4e
 [<c05cda3b>] ? sock_aio_write+0xca/0xde
 [<c05cdb93>] ? sockfd_lookup_light+0x16/0x46
 [<c0491075>] ? do_sync_write+0xab/0xe9
 [<c04448bb>] ? clockevents_program_event+0xe1/0xf0
 [<c043d342>] ? autoremove_wake_function+0x0/0x33
 [<c04de4d6>] ? security_file_permission+0xf/0x11
 [<c049188b>] ? vfs_write+0x95/0xdf
 [<c049196e>] ? sys_write+0x3b/0x60
 [<c0404c8a>] ? syscall_call+0x7/0xb
 =======================
Code: 89 e5 83 ec 08 53 56 57 8b 55 0c 8b 4d 14 81 f9 80 00 00 00 72 03
c1 e9 03 83 f9 20 74 0a 83 f9 18 74 05 b9 10 00 00 00 c1 e9 02 <89> 0a
8d 41 06 89 42 04 8b 75 10 8d 7a 08 fc 55 89 c8 f3 a5 8b
EIP: [<f917b32d>] aes_32+0x3/0x496 [ipsec] SS:ESP 0068:f5689b14
---[ end trace 0cfb5e82ec5307e9 ]---
IPSEC EVENT: KLIPS device ipsec0 shut down.


I've never done this debugging stuff before, so what am I doing wrong?

Thanks again,

Giovani


More information about the Users mailing list