[Openswan dev] Small optimisation for lots of interfaces

Thu Nov 24 20:34:21 CET 2005

Jivin D. Hugh Redelmeier lays it down ...
> | From: David McCullough <davidm at snapgear.com>
> 
> | Firstly,  it not a PC,  its a 533MHz ARM Xscale router,  so that may be why
> | it surprises you :-)
> 
> 533MHz is faster than many machines that I've used for *swan.  But I
> never tried as many tunnels.

the 533 is the fastest our group works on.  We have units raning from
4Mb ram/2Mb flash @ 66MHz and up.

> Timing problems are often not constant factors (like CPU MHz) but
> algoritmic complexity issues.
> 
> | It seems the ifconfig is the slow part (all kernel time too).
> 
> You've certainly demonstrated that.
> 
> Any idea why?  Do you have strace?  Maybe you can see if it is making
> an ureasonable number of system calls.

from looking at the code I would say it opens /proc/net/dev for each
interface to get the stats for RX/TX and so on.

> I admit that this doesn't matter too much -- you already have a
> solution.
> 
> | > I'd actually expect some of the code in Pluto for discovering
> | > interfaces to be a worse problem when there are a lot of interfaces.
> | 
> | Absolutely,  I am sure it does.
> 
> | I would be interested in others experiences with large tunnel counts
> | using OpenSwan.  I have run over 1000 simple tunnels between two hosts
> | using freeswan (ie., single SA for all tunnels),  but pluto seems to get
> | unstable with much over 200 truly independant tunnels.  Has any one else
> | has this experience ?
> 
> [Caveat: I've not been working on Pluto's code for a couple of years.]
> 
> What do you mean by "unstable"?  Slow, I would expect.  Broken, not so
> much.

Ok,  the tests are not on the current pluto either and before I did
anything else I would run them on the current openswan release.

Basically, the only one I am 100% sure I have seen is pluto silently
exiting somewhere into the large number of tunnels.

> Lots of places in Pluto use naive algorithms.  Ones that are linear in
> the number of connections when something much faster is possible (eg.
> sequential searches).  Sometimes quadratic or worse.  These have
> generally not been problems.  When you get to larger tunnel counts,
> this might well change.

Yes,  I figured there would be some of those in there.

> This was a conscious choice.  "Premture optimization is the root of all
> evil."  But when optimization is needed, it is time to do it.

I've just done a round of this on some of our interface management daemons
as well,  thus the 4000 interfaces :-)

> The most infamous performance problem is in the startup scripts: n**2
> in the number of connections.  Henry was going to fix the sh/awk/etc
> script to make it linear but did not get done.  There are
> work-arounds.  One is a C-based startup program that does not suffer
> from this.

Though O(n^2) in 'C' can still be a killer :-)

> BTW, I'd be interested in a thumbnail description of a deployment that
> uses so many tunnels.

We have two scenarios currently active with high tunnel counts.  One is
using ipsec,  the other is using GRE tunnels.

The ipsec case is a remote support arrangment.  All the remote sites
run unattended,  but a monitored by a single remote site (star
arrangment) for critical issues like temp.,  coolant levels,  things
like that.  We didn't design the solution,  but we are involved in
helping them to get it all running smoothly.  This site currently has
about 180 tunnels and is looking to increase to ~250 IIRC.  The traffic
volume is low,  and throughput performance is not a priority.

I know less about the GRE tunnels,  but the requirement is to support
1000 tunnels.

Cheers,
Davidm

-- 
David McCullough, davidm at cyberguard.com.au, Custom Embedded Solutions + Security
Ph:+61 734352815 Fx:+61 738913630 http://www.uCdot.org http://www.cyberguard.com