[Openswan Users] Growing storage consumption up to segfault

Siegfried Vogl svogl at vodata.de
Tue Mar 12 11:06:24 EDT 2019


Hello,

a customer of mine has updated its IPsec gateways from version 2.6.42 to version 2.6.51.1.
Since the update after some weeks uptime segmentation faults happen in the following place:

kernel.c, route_and_eroute():
            else if (ero != NULL)
            {
                /* restore ero's former glory */
                if (esr->eroute_owner == SOS_NOBODY)    <<<--- address of segfault
                {
                    /* note: normal or eclipse case */
                    (void) shunt_eroute(ero, esr
                                        , esr->routing, ERO_REPLACE, "restore");
                }
"esr" may be "NULL"

Unfortunately I do not have a backtrace because of a "truncated core file".

The segmentation fault, however, is only the result of a slow but constantly growing storage demand over the runtime.

The tunnels are operated via a DSL and OTA infrastructure, which is sometimes very unstable.
Further investigation has suggested that Pluto always gets into a problem when an existing tunnel, e.g. is interrupted by a DSL interference.
Since no tests can be performed on the productive computers, I have recreated the situation in a test environment and get the same result.

Method:
- Linux Openswan U2.6.42 / K4.4.6 (netkey) runs on an embedded system under BuildRoot as the client.
- The client sets up the tunnel as a road warrior. The customer also has clients that are not configured as road warrior but have a fixed IP.
- After successful tunnel setup, a DSL interference is simulated.
- The client detects the fault and stops the tunnel for its side. After that, he immediately tries again to build a new tunnel. The test is an average of every 3 seconds a new tunnel is established and the DSL interference is triggered as soon as "ipsec SA is established" is seen.

For the gateway side I enclose the output of the "ipsec barf". There, all relevant information is included.
There seems to be a problem between the SAs of Pluto and the kernel.
In addition, I enclose the output of "top" for Pluto. Here, the slowly growing storage needs can be tracked. If you let the test run long enough, the segmentation fault mentioned above occurs.

I also tried to run Openswan with the "Leak Detective". He brings an issue right at the beginning, but I can not do anything with it. I just attach it too. It's from a different run of the test.

Currently, as a workaround, the affected gateways are restarted upon detection of a high memory requirement. But in the long run, that's not a solution, as there are hundreds of tunnels crashing with each restart of a gateway. 
Under version 2.6.42, the problem has not occurred for years.

Siegfried Vogl


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: IpsecBarf.6.txt
URL: <http://lists.openswan.org/pipermail/users/attachments/20190312/a7f2974d/attachment-0003.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: top.pluto.txt
URL: <http://lists.openswan.org/pipermail/users/attachments/20190312/a7f2974d/attachment-0004.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: LeakDetective.txt
URL: <http://lists.openswan.org/pipermail/users/attachments/20190312/a7f2974d/attachment-0005.txt>


More information about the Users mailing list