[Openswan dev] Small optimisation for lots of interfaces

Ken Bantoft ken at xelerance.com
Fri Nov 25 02:23:29 CET 2005


On Fri, 25 Nov 2005, David McCullough wrote:

>
> Jivin Paul Wouters lays it down ...
>> On Thu, 24 Nov 2005, David McCullough wrote:
>>
>>> Using simple tunnels (ie., same two hosts,  same secret, lots of networks)
>>> I have seen the following pluto silently exit sometime between 1000 and
>>> 2000 tunnels.  I cannot remember f I saw it crash or not in this
>>> scenario.  Each tunnel was exercised as it came up to enure data would
>>> pass through ok.
>>
>> I would really like to see the core files of pluto dying in such case.
>> We did not observe this behaviour when we did testing with thousands of
>> tunnels over a year and a half ago when running against an (Ixia?) IPsec
>> testbox.
>
> Confirmed today that there are no core dumps,  just the mystery exit
> and the in other case tunnels were getting torn down.
>
> Our platforms can't easily dump core (or filesystems etc),  but we print
> a dump when they crash which usually gives us enough to find the
> problem.  You would need arm-linux tools to debug the core either way ;-)

That's not an issue - I think we've got at least one ARM box around, and I 
know I have the arm-linux toolchain setup in my crosscompiling farm.

>>> Some of the problems we have seen are were related to dead peer code.
>>> Once you get a significant tunnel count you need to backoff the DPD
>>> timers quite a bit or pluto starts pulling tunnels down.
>>
>> If you are congested, and are dropping DPD packets, then sure this will
>> happen. That's why some call it "make deads". If your bandwidth is
>> congested, and you are using DPD, then you will lose tunnels. The fix is
>> to buy more bandwidth, or disable DPD. Or if you can, somehow give DPD
>> packets a higher priority then other packets, but I'm not sure if you
>> can use QoS there.
>
> Yeah,  I realise that,  but customers being what they are,  they want
> DPD,  so we just need to adjust it till it's not the problem.

Yup... delay=10 / timeout=30 probably does the trick.


Ken


More information about the Dev mailing list