[Openswan Users] Very very strange problem, cured by (arrrrrgh) a reboot.

Mon Jun 28 12:11:12 EDT 2010

Don't everyone jump in at once.  J  So what would create a condition
where Openswan routes some connections across an IPSEC tunnel but not
others?  The general sequence of events:

Steady state operations.

Telecom outage on the left side; everything drops.

Telecom fixed; left side is back online (sort of).

Finally reboot the left side Openswan/firewall system, now everyone up
and running.

What can cause this behavior after a telecom outage?

-          Greg Scott

From: users-bounces at openswan.org [mailto:users-bounces at openswan.org] On
Behalf Of Greg Scott
Sent: Friday, June 25, 2010 2:46 PM
To: users at openswan.org
Subject: [Openswan Users] Very very strange problem,cured by (arrrrrgh)
a reboot.

I don't even know how to describe this in a subject line.  This is a
single tunnel connecting two sites.  Both are running U2.6.25 on Fedora
12.  A telecom issue took out the left side yesterday. The telecom
issues were fixed a few hours ago and folks on the left side could ping
the right side.  The tunnel was up and running again.  Well sort of.
Exactly one user on the left side was able to launch an RDP session to
the RDP server on the right side.  Nobody else could make this happen.
However, everyone on the left side could ping any host they wanted on
the right side. They could do RDP sessions to **other** hosts on the
right side, just not this particular host - the one everyone cared
about.  

The left side is 10.86.2.nnn/24, right side 10.86.0.nnn/24.  The
relevant terminal server on the right side is 10.86.0.20.  The right
side also has 2 other servers at .9 and .15.  

Watching tcpdump on both sides, I saw tcp 3389 stuff (RDP packets)
coming out of the left side, but never reaching the right side.
Curiously, the left side sent out ARP queries looking for
10.86.0.20,which is weird because that's a completely different subnet.

But here is the strange part.  The Windows XP host at 10.86.2.104 could
successfully do RDP sessions to 10.86.0.9 and 10.86.0.15, but not to
10.86.0.20. 

In desperation, I rebooted the left side firewall, restarting
everything, and now it all works.  Everyone is up and running.  It's not
possible to make up this kind of stuff.  How in the world do I
troubleshoot something bizarre like this??

Here's a record from /var/log/secure.  This was during the outage

Jun 25 11:12:09 localhost pluto[2288]: "garelick-hq" #667:
STATE_MAIN_R3: sent MR3, ISAKMP SA established {auth=OAKLEY_RSA_SIG
cipher=aes_128 prf=oakley_sha group=modp2048}

Any thoughts?

Thanks

-          Greg Scott

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openswan.org/pipermail/users/attachments/20100628/172cad4e/attachment.html