[Openswan dev] OpenSwan 2.6.10-1 on OpenWrt 7.09 consistently hangs on large HTTP file transfer

starlight at binnacle.cx starlight at binnacle.cx
Wed Dec 5 19:27:11 EST 2007


I was looking at the log file again, and came up with a theory. 
Possibly a bit random, but it fits.

Connectivity does not go to pieces exactly on the 4GB boundary. 
However I noticed that the session reset seems to be the actual 
trigger--it was at 26000+ seconds when it went.  Possibly the 
Cisco forced the key event or the number in the log is not 
perfectly synchronized to the local 28800 rekey interval.

Anyway, the theory is that rekeying a session that has seen more 
the 4GB of data transfer is the trigger.  It's a bit out there, 
but it fits what seems to be happening.  I'm assuming that 
rekeying a link under heavy load with less the 4GB transferred 
is extensively tested as it would seem to be a common event.  
I'll bet that rekeying a session under heavy load with more than 
4GB is not happening very often in the population of installed 
systems.  Perhaps the issue is specific to MIPS with emulated 
floating point.

I'll let it blow one more time.  Then tomorrow I'll try setting 
the key interval to one hour and running it again.  The one hour 
interval will prevent rekeying from happening with a high byte 
count.




>I think I've got reproducing this nailed down.  The second a 
>parallel session really does the trick.
>
>Can predict roughly when it will blow now, based on time and the 
>byte counter value for the 10.81.82.5 session.
>
>Running it again tonight, will fail in about five or six hours.
>
>Do you want me to enable any debug tracing when that counter 
>gets close to the 4GB boundary?
>
>
>
>
>>It blew again.  This time half-way between completely
>>hosed and just not responding to connection requests
>>from the remote.  Was able to stop/start OpenSwan and
>>recover full functionality--without rebooting the router.
>>
>>Log file attached.  Blew up at around 03:32 - 03:33.
>>Curiously free memory increased at the time of the failure.
>>
>>Dec  5 03:33:44 router kernel: eth0.1: unable to resolve type 
>>3800 addresses.  
>>Dec  5 03:35:44 router last message repeated 2 times
>>Dec  5 03:37:44 router last message repeated 2 times
>>Dec  5 03:39:44 router last message repeated 2 times
>>Dec  5 03:41:44 router last message repeated 2 times
>>Dec  5 03:41:44 router kernel: eth0.1: unable to resolve type 
>>3800 addresses.
>>Dec  5 03:42:34 router kernel: eth0.1: unable to resolve type 
>>5400 addresses.
>>Dec  5 03:42:40 router kernel: eth0.1: unable to resolve type 
>>5400 addresses.



More information about the Dev mailing list