[Openswan dev] Crash in ipsec_xmit_state_delete()

Mon Feb 21 04:10:02 EST 2011

Am 21.02.2011 02:04, schrieb David McCullough:
>
> Jivin David McCullough lays it down ...
>>
>> Jivin Holger Kummert lays it down ...
>>>
>>> Hello,
>>>
>>> I'm new to this list and I have a question regarding running the Openswan 2.6.32
>>> on Linux kernel 2.6.32, architecture is ARM IXP425. My goal is to use the hw-acceleration
>>> of the ixp-chip.
>>> In a first step I'm trying to use the sw-kernel-algorithms of the kernel, which worked
>>> pretty well with Openswan 2.6.32 (with klips) and linux-kernel 2.6.27.
>>> The ocf-module is in use.
>>> With the newer kernel I get a crash when only one ping-packet is transported back and forth
>>> through the tunnel:
>>>
>>> :~# Unable to handle kernel paging request at virtual address 6b6b6cf7
>>> pgd = c4b3c000
>>> [6b6b6cf7] *pgd=00000000
>>> Internal error: Oops: f3 [#1]
>>> last sysfs file: /sys/devices/platform/i2c-gpio.0/i2c-0/0-0014/name
>>> Modules linked in: ipsec serocco fwnet(P) fwcodec(P) trace(P) cryptosoft cryptodev(P) ocf(P) e100
>>> ixp4xx_hss ixp4xx_eth libphy max66xx at24 pca954x i2c_mux
>>> CPU: 0    Tainted: P            (2.6.32.28 #8)
>>> PC is at ipsec_xmit_state_delete+0x48/0x90 [ipsec]
>>> LR is at poison_obj+0x40/0x50
>>> pc : [<bf0df048>]    lr : [<c0098890>]    psr: 20000013
>>> sp : c7821ef4  ip : 6b6b6b6b  fp : c7821f04
>>> r10: 00000018  r9 : 0000000a  r8 : 00000006
>>> r7 : 00000001  r6 : 00000000  r5 : c4facb38  r4 : c4facb38
>>> r3 : 00000004  r2 : c4d8586c  r1 : 6b6b6b6b  r0 : 6b6b6b6b
>>> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
>>> Control: 000039ff  Table: 04b3c000  DAC: 00000017
>>> Process ksoftirqd/0 (pid: 3, stack limit = 0xc7820260)
>>> Stack: (0xc7821ef4 to 0xc7822000)
>>> 1ee0:                                              c4facb98 c7821f1c c7821f08
>>> 1f00: bf0dd790 bf0df00c c4facb38 000003e7 c7821f38 c7821f20 bf0e29f0 bf0dd618
>>> 1f20: 00000000 c03bde60 c03e71d8 c7821f48 c7821f3c bf0f5188 bf0e2898 c7821f60
>>> 1f40: c7821f4c c004aa70 bf0f5168 c7820000 00000100 c7821f94 c7821f64 c004a6b8
>>> 1f60: c004a9f0 c0040bfc c0040bb8 60000013 00000000 c03e71a0 00000000 00000000
>>> 1f80: 00000000 00000000 c7821fa8 c7821f98 c004a788 c004a640 c7820000 c7821fc8
>>> 1fa0: c7821fac c004aea0 c004a750 c781bf20 00000000 c004ae40 00000000 c7821ff4
>>> 1fc0: c7821fcc c0059f50 c004ae4c 00000000 00000000 c7821fd4 c7821fd4 00000000
>>> 1fe0: 00000000 00000000 00000000 c7821ff8 c0047ed8 c0059edc afed8800 a7b50300
>>> Backtrace:
>>> [<bf0df000>] (ipsec_xmit_state_delete+0x0/0x90 [ipsec]) from [<bf0dd790>]
>>> (ipsec_tunnel_xsm_complete+0x184/0x19c [ipsec])
>>>    r4:c4facb98
>>> [<bf0dd60c>] (ipsec_tunnel_xsm_complete+0x0/0x19c [ipsec]) from [<bf0e29f0>] (ipsec_xsm+0x164/0x190
>>> [ipsec])
>>>    r5:000003e7 r4:c4facb38
>>> [<bf0e288c>] (ipsec_xsm+0x0/0x190 [ipsec]) from [<bf0f5188>] (ipsec_ocf_skbq_process+0x2c/0x60 [ipsec])
>>>    r6:c03e71d8 r5:c03bde60 r4:00000000
>>> [<bf0f515c>] (ipsec_ocf_skbq_process+0x0/0x60 [ipsec]) from [<c004aa70>] (tasklet_action+0x8c/0xe0)
>>> [<c004a9e4>] (tasklet_action+0x0/0xe0) from [<c004a6b8>] (__do_softirq+0x84/0x110)
>>>    r5:00000100 r4:c7820000
>>> [<c004a634>] (__do_softirq+0x0/0x110) from [<c004a788>] (do_softirq+0x44/0x60)
>>> [<c004a744>] (do_softirq+0x0/0x60) from [<c004aea0>] (ksoftirqd+0x60/0xe4)
>>>    r4:c7820000
>>> [<c004ae40>] (ksoftirqd+0x0/0xe4) from [<c0059f50>] (kthread+0x80/0x88)
>>>    r7:00000000 r6:c004ae40 r5:00000000 r4:c781bf20
>>> [<c0059ed0>] (kthread+0x0/0x88) from [<c0047ed8>] (do_exit+0x0/0x238)
>>>    r6:00000000 r5:00000000 r4:00000000
>>> Code: eb3eeeb6 e5940008 e3500000 0a00000d (e590018c)
>>> ---[ end trace d04501fb2fd46b26 ]---
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>
>>>
>>> The loaded modules are (roughly):
>>> ~# lsmod
>>>
>>> ipsec 342690 2 - Live 0xbf0d7000
>>> serocco 53763 0 - Live 0xbf0c2000
>>> cryptosoft 11819 0 - Live 0xbf059000
>>> cryptodev 10729 1 - Live 0xbf051000 (P)
>>> ocf 16785 3 ipsec,cryptosoft,cryptodev, Live 0xbf045000 (P)
>>> e100 31485 0 - Live 0xbf036000
>>> ixp4xx_hss 10620 0 - Live 0xbf02e000
>>> ixp4xx_eth 9876 0 - Live 0xbf026000
>>> libphy 13104 1 ixp4xx_eth, Live 0xbf01b000
>>> max66xx 11090 0 - Live 0xbf012000
>>> at24 4047 0 - Live 0xbf00c000
>>> pca954x 2504 0 - Live 0xbf006000
>>> i2c_mux 1466 1 pca954x, Live 0xbf000000
>>> version 2
>>>
>>>
>>> /etc/ipsec.conf
>>>
>>> config setup
>>>           interfaces="ipsec0=eth2:1"
>>>           klipsdebug=none
>>>           plutodebug=none
>>>
>>> conn nessi01
>>>           auto=start
>>>           ike=aes256-sha1;modp1024
>>>           phase2alg=aes256-sha1
>>>           authby=secret
>>>           left=11.0.0.1
>>>           leftnexthop=0.0.0.0
>>>           leftsubnet=20.0.0.0/24
>>>           right=11.0.0.2
>>>           rightsubnet=10.0.0.1/32
>>>           compress=no
>>>           dpddelay=10
>>>           dpdtimeout=30
>>>           dpdaction=restart
>>>
>>>
>>> When I look at the function where the crash happens, I ask myself if
>>> there is a wrong sequence of commands:
>>>
>>> ipsec_xmit_state_delete (struct ipsec_xmit_state *ixs)
>>> {
>>>           if (unlikely (! ixs))
>>>                   return;
>>>
>>>           spin_lock_bh (&ixs_cache_lock);
>>>           ixs_cache_allocated_count--;
>>>           kmem_cache_free (ixs_cache_allocator, ixs);
>>> #if defined(HAS_NETIF_QUEUE) || defined (HAVE_NETIF_QUEUE)
>>>           if (ixs->dev&&  netif_queue_stopped(ixs->dev))
>>>                   netif_wake_queue(ixs->dev);
>>> #else /* defined(HAS_NETIF_QUEUE) || defined (HAVE_NETIF_QUEUE) */
>>>           if (ixs->dev)
>>>                   ixs->dev->tbusy = 0;
>>> #endif /* defined(HAS_NETIF_QUEUE) || defined (HAVE_NETIF_QUEUE) */
>>>           spin_unlock_bh (&ixs_cache_lock);
>>> }
>>>
>>> Isn't the kmem_cache_free()-call too early? After the free the function accesses
>>> the freed variable 'ixs'.
>>>
>>> When I exchange those (i.e. move the free-call after the access), I get another crash:
>>>
>>> klips_debug:ipsec_xmit_state_delete(obj 0xc5202b38)
>>>
>>> slab: double free detected in cache 'size-256', objp c4cfa060
>>> Unable to handle kernel NULL pointer dereference at virtual address 00000000
>>> Backtrace:
>>> [<c0099954>] (slab_put_obj+0x0/0x9c) from [<c009a520>] (free_block+0xcc/0x174)
>>>    r9:00000002 r8:c4cfa060 r7:00000000 r6:c78006c0 r5:c7802320
>>> r4:c4cfa000
>>> [<c009a454>] (free_block+0x0/0x174) from [<c009b16c>] (drain_array+0xac/0xe0)
>>> [<c009b0c0>] (drain_array+0x0/0xe0) from [<c009b208>] (cache_reap+0x68/0x134)
>>>    r6:00000000 r5:c78006c0 r4:c7802320
>>> [<c009b1a0>] (cache_reap+0x0/0x134) from [<c0056408>] (run_workqueue+0x98/0x118)
>>> [<c0056370>] (run_workqueue+0x0/0x118) from [<c005653c>] (worker_thread+0xb4/0xc4)
>>>
>>> I enabled Debugging in slab to see this.
>>>
>>> Did anyone see this behavior before?
>>
>> Haven't seen that but the code is definately wrong IMO.
>>
>> Try the attached patch.
>
> And then make the patch better with this one ;-)

Thanks for your answer. I applied both patches but the result isn't that better.
After sending two ping packets across the tunnel I get:

~# slab: double free detected in cache 'size-256', objp c590f060
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1]
last sysfs file: /sys/devices/platform/i2c-gpio.0/i2c-0/0-0014/name
Modules linked in: ipsec serocco fwnet(P) fwcodec(P) trace(P) cryptosoft cryptodev(P) ocf(P) e100 
ixp4xx_hss ixp4xx_eth libphy max66xx at24 pca954x i2c_mux
CPU: 0    Tainted: P            (2.6.32.28 #9)
PC is at slab_put_obj+0x6c/0x9c
LR is at release_console_sem+0xa4/0xc4
pc : [<c00999c0>]    lr : [<c00467a4>]    psr: 20000093
sp : c7825ecc  ip : c7825e28  fp : c7825ef0
r10: 00000003  r9 : 00000000  r8 : 00000000
r7 : c78006c0  r6 : c590f060  r5 : 00000000  r4 : c590f000
r3 : 00000000  r2 : 60000093  r1 : 00003b55  r0 : 00000044
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 000039ff  Table: 043fc000  DAC: 00000017
Process events/0 (pid: 5, stack limit = 0xc7824260)
Stack: (0xc7825ecc to 0xc7826000)
5ec0:                            c590f000 c7802320 c78006c0 00000000 c590f060
5ee0: 00000007 c7825f20 c7825ef4 c009a520 c0099960 c7805450 c7805450 00000007
5f00: c7805440 c03ea910 c03c1534 00000000 00000000 c7825f3c c7825f24 c009b16c
5f20: c009a460 c7802320 c78006c0 00000000 c7825f6c c7825f40 c009b208 c009b0cc
5f40: 00000000 c0040bb8 c7801280 c009b1a0 c7825f9c 00000000 00000000 00000000
5f60: c7825f84 c7825f70 c0056408 c009b1ac c7801288 c7801280 c7825fc8 c7825f88
5f80: c005653c c005637c 00000000 c7815040 c005a438 c7825fa8 c7825fa8 00000000
5fa0: c7815040 c005a438 c7825fa8 c7825fa8 c781bf18 c7801280 c0056488 c7825ff4
5fc0: c7825fcc c0059f50 c0056494 00000000 00000000 c7825fd4 c7825fd4 00000000
5fe0: 00000000 00000000 00000000 c7825ff8 c0047ed8 c0059edc 71e7a000 6c753467
Backtrace:
[<c0099954>] (slab_put_obj+0x0/0x9c) from [<c009a520>] (free_block+0xcc/0x174)
  r9:00000007 r8:c590f060 r7:00000000 r6:c78006c0 r5:c7802320
r4:c590f000
[<c009a454>] (free_block+0x0/0x174) from [<c009b16c>] (drain_array+0xac/0xe0)
[<c009b0c0>] (drain_array+0x0/0xe0) from [<c009b208>] (cache_reap+0x68/0x134)
  r6:00000000 r5:c78006c0 r4:c7802320
[<c009b1a0>] (cache_reap+0x0/0x134) from [<c0056408>] (run_workqueue+0x98/0x118)
[<c0056370>] (run_workqueue+0x0/0x118) from [<c005653c>] (worker_thread+0xb4/0xc4)
  r5:c7801280 r4:c7801288
[<c0056488>] (worker_thread+0x0/0xc4) from [<c0059f50>] (kthread+0x80/0x88)
  r6:c0056488 r5:c7801280 r4:c781bf18
[<c0059ed0>] (kthread+0x0/0x88) from [<c0047ed8>] (do_exit+0x0/0x238)
  r6:00000000 r5:00000000 r4:00000000
Code: e1a02006 e59f0030 ebfeb192 e3a03000 (e5833000)
---[ end trace bdafab5257d98a85 ]---

This looks similar to the crash I got before when I exchanged the free() and access
functions to 'ixs'.

Where could this 'double free' come from?

Best regards,
Holger

>
> Cheers,
> Davidm
>