[Openswan Users] openswan MTU problems
Ernesto Alvarez
ernest-ips at datatransfer.com.ar
Wed Oct 19 12:41:27 CEST 2005
Hello.
We've been using openswan here for a project for the last few months and
everything was fine until we had an incident last week.
After trying to find the exact causes for a few days we've managed to
find out a very precise sequence of events, but have been unable to find
out what might have caused them. I'm posting the event hoping that
someone might have an idea that we've missed or has an insightful
investigation path that we might have missed.
We have two servers running jboss and pgpool (a postgres frontend used
to replicate postgres database in multiple servers) and a postgres
database on each. We encrypt everything between them using transport
mode ESP with pre shared key. The servers are directly connected via
Ethernet. Everything was going fine until a certain point in time when
some things happened (almost simultaneously):
1. Openswan rekeyed a SA on one server.
2. We lost communications unidirectionally, no new messages from the
server that just rekeyed but old flows were getting thru.
3. After a time, the pgpool on the machine that was NOT rekeying
detected the other node as dead, while the one rekeying did not notice
anything (we think it was operating normally).
4. We began receiving messages like "Oct 12 19:19:26 bradbury kernel:
pmtu discovery on SA ESP/e5fb8ff6/c0a85015" for approximately 15 minutes.
Seeing the log messages we've suspected mtu problems. We tested sending
packets hoping that we would find an ESP packet with length greater than
the MTU, however after sending test packets of various lengths (we've
covered every possible length from 1000 to 1500) but found nothing
unusual (it started sending two ESP packets when necessary). We tested
both using ping and a custom udp sending program, so we could check both
raw sockets and udp.
We're pretty sure it was the combination of some improbable events, that
might even include iptables and the kernel, but after lots of tests,
we're running out of hypotheses. FYI, we've suspected a misuse of
iptables but found no evidence of anyone executing anything related to
iptables in those nodes.
We're using Debian Sarge Linux, kernel 2.6.8 (using linux's own ipsec
kernel modules) and openswan 2.2.0. If anybody has an idea regarding the
causes of these events or even some ideas to check, we'd please like to
hear about them. If you need some logs to check any ideas you have, just
ask and we'll post them.
Thanks in advance.
Ernesto Alvarez.
Network administrator.
More information about the Users
mailing list