[Openswan Users] UDP fragmentation in Linux
Paul Wouters
paul at xelerance.com
Fri Mar 4 19:20:13 CET 2005
On Fri, 4 Mar 2005, Marcus Leech wrote:
Marcus,
Perhaps you should forward this to the netfilter and linux-net (?) mailinglists?
Thank you for this extensive testing. I will try and see if I can reproduce this,
and see if it is not an issue on 2.4.17.
Paul
> From: Marcus Leech <mleech at nortel.com>
> To: users at openswan.org, mcr at xelerance.org
> Subject: [Openswan Users] UDP fragmentation in Linux
>
> After my fiasco of last night (trying to use 2048-bit certs and having them
> utterly fail to make across
> the network), I've started looking into Linux UDP fragmentation grossness.
>
> It seems that even if you set the appropriate IP options (IP_MTU_DISCOVER to
> IP_PMTU_DONT), UDP
> packets are getting badly munged if they exceed the local MTU. It looks
> like they're simply getting *truncated*,
> which is so NOT according to spec that it makes me ill. It's not like the
> Linux stack can't deal with sending
> fragments, either, since pings with sizes > local MTU get fragmented, sent
> across the internet, and apparently
> correctly reassembled at the other end.
>
> But with UDP packets (NOT JUST PLUTO--I wrote some test code), the stack
> simply emits a single packet with
> the "more fragments" flag bit set in the IP header, the UDP length field set
> to the UDP length, and the IP length set to
> the MTU. But the trailing fragment(s) never get emitted--just the first
> one. This would cause a fragment reassembly
> timeout at the receiver. This is so broken, I don't even know where to
> begin (splutter, grumble). The behaviour goes back to at least
> 2.4.18, and is consistent in 2.6.11. I'm surely not the first person to
> observe this behaviour and start ranting.
>
> Another observation. When I was testing this stuff purely-locally (on the
> same IP subnet), I could use long
> certificates, and nothing bad happened. I can only assume that the Linux
> stack detects the "local subnettedness"
> and uses jumbograms--I don't have the patience/energy to go back and set it
> up again to run a tcpdump.
>
> I'm suspecting that the IPTABLES code is scewing up in some way, since the
> kernel ip_output routines call
> NF_HOOK, rather than passing directly to the routing-chosen hardware device.
> Somewhere in all
> that netfilter goop, I think that the output packet fragmentation code has
> become broken--at least for UDP.
> Like I observed, ICMP ECHO packets get correctly fragmented when they exceed
> the local MTU.
>
> I can't believe people put up with this. It's so horribly, outrageously
> broken. Now, I know that there are
> those that argue that IP fragmentation itself is *conceptually* broken, but
> the fact is that it's standard,
> and it largely works. The exceptions are firewalls, which don't like to
> deal with reassembly, so they
> drop fragments on the floor as punishment. But I think that the community
> has slowly become confused
> about IP fragments--letting the poor behaviour of firewalls and similar IP
> machinery dicate a new, and
> profoundly-bad de-facto standard.
>
> I know that in IPV6, there's no fragmentation at all. But minimum MTU is also
> larger.
>
> In the absence of app-layer fragmentation in IKE, how am I supposed to
> support larger (2048-bit)
> certificates?
>
>
> _______________________________________________
> Users mailing list
> Users at openswan.org
> http://lists.openswan.org/mailman/listinfo/users
>
--
"At best it is a theory, at worst a fantasy" -- Michael Crichton
More information about the Users
mailing list