[Openswan Users] openswan total outage after issues with one peer

Sun Sep 21 18:02:03 EDT 2008

 Hi, 

 I'd like to share my sad experiences, maybe someone could say
something useful how to prevent future cases.  

 I have an openswan 2.4.6 running on linux 2.6 have many peers. Some
days ago one remote peer started rerequesting the ISAKMP SA   

 unreasonably often and far before the previous one expired and from
one point it did it in every 2 secs.   

 After this soon the next logged in relation to this issue:  

 ....   

 anytime anywhere pluto[21105]: | peer and cookies match on #258454,
provided msgid 00000000 vs 99006a99 anytime anywhere pluto[21105]: |
peer and cookies match on #258452, provided msgid 00000000 vs 435fa025
anytime anywhere pluto[21105]: | peer and cookies match on #258451,
provided msgid 00000000 vs 9f957011 anytime anywhere pluto[21105]: |
peer and cookies match on #258450, provided msgid 00000000 vs 6c2ca129
anytime anywhere pluto[21105]: | peer and cookies match on #258449,
provided msgid 00000000 vs 561a68f8 anytime anywhere pluto[21105]: |
peer
and cookies match on #258448, provided msgid 00000000 vs 387073ec
anytime anywhere pluto[21105]: | peer and cookies match on #258447,
provided msgid 00000000 vs 735e9b15 anytime anywhere pluto[21105]: |
peer and cookies match on #258446, provided msgid 00000000 vs b90854d6

 netlink_get: XFRM_MSG_EXPIRE message   

 and handling EVENT_RETRANSMITs. Within seconds the next happened
with each connection - not just with the faulty one - during its
processing:  

 anytime anywhere pluto[21105]: "A_PEER" #258359: ERROR: netlink
response for Add SA esp.dab9f5ae at 71.92.33.54 included errno 3: No such
process anytime anywhere pluto[21105]: | complete state transition
with STF_INTERNAL_ERROR anytime anywhere pluto[21105]: | state
transition function for STATE_QUICK_I1 had internal error  

 And after timeout of SA, just a consequence: anytime anywhere
pluto[21105]: "A_PEER" #258359: ISAKMP SA expired (LATEST!)   

 So after a hard rush of one faulty peer caused serious problem with
all
connections, it makes me worried because I never saw anything like
this seems   

 scalability problem with openswan. DoS protection is implemented in
Openswan since version 2.3.0 but from this point I'm confused a little
how to be sure   

 not to see it again. May I limit the incoming IKEs / peer with
iptables, perhaps ? I do not think it is a system issue, the load was
low while it happened.   
 Thanks in advance for any comment  

 Best Regards   

 Peter    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openswan.org/pipermail/users/attachments/20080922/4f52a443/attachment.html