[Openswan Users] UDP fragmentation in Linux

Fri Mar 4 19:20:13 CET 2005

On Fri, 4 Mar 2005, Marcus Leech wrote:

Marcus,

Perhaps you should forward this to the netfilter and linux-net (?) mailinglists?

Thank you for this extensive testing. I will try and see if I can reproduce this,
and see if it is not an issue on 2.4.17.

Paul

> From: Marcus Leech <mleech at nortel.com>
> To: users at openswan.org, mcr at xelerance.org
> Subject: [Openswan Users] UDP fragmentation in Linux
> 
> After my fiasco of last night (trying to use 2048-bit certs and having them 
> utterly fail to make across
> the network), I've started looking into Linux UDP fragmentation grossness.
>
> It seems that even if you set the appropriate IP options (IP_MTU_DISCOVER to 
> IP_PMTU_DONT), UDP
> packets are getting badly munged if they exceed the local MTU.  It looks 
> like they're simply getting *truncated*,
> which is so NOT according to spec that it makes me ill.  It's not like the 
> Linux stack can't deal with sending
> fragments, either, since pings with sizes > local MTU get fragmented, sent 
> across the internet, and apparently
> correctly reassembled at the other end.
>
> But with UDP packets (NOT JUST PLUTO--I wrote some test code), the stack 
> simply emits a single packet with
> the "more fragments" flag bit set in the IP header, the UDP length field set 
> to the UDP length, and the IP length set to
> the MTU.  But the trailing fragment(s) never get emitted--just the first 
> one.  This would cause a fragment reassembly
> timeout at the receiver.  This is so broken, I don't even know where to 
> begin (splutter, grumble).  The behaviour goes back to at least
> 2.4.18, and is consistent in 2.6.11.  I'm surely not the first person to 
> observe this behaviour and start ranting.
>
> Another observation.  When I was testing this stuff purely-locally (on the 
> same IP subnet), I could use long
> certificates, and nothing bad happened.  I can only assume that the Linux 
> stack detects the "local subnettedness"
> and uses jumbograms--I don't have the patience/energy to go back and set it 
> up again to run a tcpdump.
>
> I'm suspecting that the IPTABLES code is scewing up in some way, since the 
> kernel ip_output routines call
> NF_HOOK, rather than passing directly to the routing-chosen hardware device. 
> Somewhere in all
> that netfilter goop, I think that the output packet fragmentation code has 
> become broken--at least for UDP.
> Like I observed, ICMP ECHO packets get correctly fragmented when they exceed 
> the local MTU.
>
> I can't believe people put up with this.  It's so horribly, outrageously 
> broken.   Now, I know that there are
> those that argue that IP fragmentation itself is *conceptually* broken, but 
> the fact is that it's standard,
> and it largely works.  The exceptions are firewalls, which don't like to 
> deal with reassembly, so they
> drop fragments on the floor as punishment.  But I think that the community 
> has slowly become confused
> about IP fragments--letting the poor behaviour of firewalls and similar IP 
> machinery dicate a new, and
> profoundly-bad de-facto standard.
>
> I know that in IPV6, there's no fragmentation at all. But minimum MTU is also 
> larger.
>
> In the absence of app-layer fragmentation in IKE, how am I supposed to 
> support larger (2048-bit)
> certificates?
>
>
> _______________________________________________
> Users mailing list
> Users at openswan.org
> http://lists.openswan.org/mailman/listinfo/users
>

-- 

"At best it is a theory, at worst a fantasy" -- Michael Crichton