[Openswan Users] Nasty MTU problem - Please Help

Tue Jan 13 23:42:29 EST 2009

I have a simple 2-net Openswan setup, both running 2.6.19 on recent Mandriva 
distros and kernels (2009/2008 on 2.6.27/2.6.22).  I've been a long-time user 
of Openswan with few problems, but recently made ISP changes that forced some 
changes.

NetA --------> G1 --> internet <-- G2 <----- NetB

NetA Host=10.20.0.10 (one of several)
G1 (eth1 internal 10.20.0.1, ppp0 external (using eth0))
G2 (eth1 inernal 10.20.1.1, eth0 external via dhcp)
NetB Host=10.20.1.10 (one of several)

Both G1 and G2 have DSL services, but G1 is using PPPoE (Roaring Pengiun 
3.8.5) through a Motorola 3347 DSL modem in bridging mode.  G2 has a standard 
DSL modem with standard network gateway settings (no PPP).  Both networks 
work fine by themselves. 

The pppoe command options on G1 are:
pppoe -m 1412 -I eth0

Sending files from a NetA host to NetB host works fine with scp.  Sending a 
file via scp from the same NetB host back to NetA always hangs (or long ssh 
output).  If I wireshark the transfer on the G2 internal eth1 I see the 
following packets:
10.20.1.10 -> 10.20.0.10 SSHv2 packet len=1448
10.20.1.1  -> 10.20.1.10 ICMP Destination unreachable (Fragmentation needed)
10.20.1.10 -> 10.20.1.10 SSHv2 packet len-1448
10.20.1.1  -> 10.20.1.10 ICMP Destination unreachable (Fragmentation needed)
10.20.1.10 -> 10.20.0.10 SSHV2 [TCP Out-of-order] len=1386
10.20.1.10 -> 10.20.0.10 SSHv2 [TCP Out-of-order] len=62

10.20.1.10 -> 10.20.0.10 SSHV2 [TCP Out-of-order] len=1386
10.20.1.10 -> 10.20.0.10 SSHv2 [TCP Out-of-order] len=62
10.20.0.10 -> 10.20.0.10 TCP   [TCP Dup ACK]

(I have saved the whole capture and can include if its needed).
then similar repeating patterns of 1386 length packets that never make it.  
The ssh packets are flags with don't fragment.  Its clear ssh is seeing the 
attempted mtu discovery, but it doesn't seem to be low enough.

If I ping from 10.20.1.10 to 10.20.0.10, a "ping -s 1394" will get through, 
but a "ping -s 1395" wont.  Reversing the pings (from 10.20.0.10 to 
10.20.1.10), anything over 1394 will produce 
From murdock.foddy.home (10.20.0.1) icmp_seq=1 Frag needed and DF set (mtu = 
1422)
before the pings start getting through.  Interestingly, if I ssh to G2 then to 
NetB host outside the VPN (normal internet SSH), the transfers never hang.  
So it seems to only be the vpn connections having the problem.

My ipsec.conf file is below (comments and non-critical sections removed)

========================================================================
version 2.0 

# basic configuration
config setup
       nat_traversal=no
        overridemtu=1440
        protostack=netkey

conn foddy
        left=216.160.0.218  
        leftsubnet=10.20.0.0/24
        leftid=@bfoddy.homeip.net
        leftrsasigkey=0sAQNd...
        leftnexthop=%defaultroute
        leftsourceip=10.20.0.1

        right=199.120.114.184
        rightsubnet=10.20.1.0/24
        rightid=@hfoddy.homeip.net
        rightrsasigkey=0sAQN...
        rightnexthop=%defaultroute
        rightsourceip=10.20.1.1
        auto=start     
==========================================================================

I thought I found the solution with the overridemtu, but a message on startup 
says its ignored with netkey configs.

Other tidbits, G1 has 2 Intel ePro100 cards.  I found references saying the 
eepro100 module was buggy and so I forced it to use the e100 driver, no help.  
G1 has 3 nic cards, eth0 (the ppp0), eth1 (internal), and eth2 connected to a 
vlan partition on the 3347 router for 2 separate wireless nets.
Both (G1 and G2) are running Shorewall,
G1 = 4.0.13
G2 = 3.4.4

On G1, I have set /etc/shorewall/shorewall.conf CLAMPSS=Yes as suggested by 
that documentation.

How can I get this working, short of forcing all G2 traffic to a smaller MTU 
that would affect a lot more traffic than just the VPN?

Thanks,
Brian