[Openswan dev] [PATCH] Fix race condition between pluto start and whack
D. Hugh Redelmeier
hugh at mimosa.com
Fri Apr 8 10:38:08 EDT 2011
| From: Mattias Walstrom <lazzer at vmlinux.org>
| Yes, it is marked as 'Z' in the process list. But it gets stuck like
| that, this does not happen all the time, but sometimes.
| > Are you running pluto with --nofork?
What is the parent process of Pluto? Why isn't it doing a "wait" so
that Pluto is reaped?
Guess: if Pluto terminates, the socket will be closed, and whack will
cease to wait.
But #1: if Pluto initialization takes a long time (in my day it
didn't, but perhaps certificate loading or something else new takes
time), whack will have to wait.
Pluto is designed as an event-driven system where events should be
processed quickly; initialization should be processed quickly too.
What has made it slow?
But #2: if the terminated Pluto isn't reaped, perhaps the socket isn't
closed. Make sure that whatever runs Pluto also reaps it promptly.
| > | When sending any whack message during the time pluto starts, whack will
| > | just hang for a long time. And if pluto should die during this hang, it
| > | will become a Zombie.
| > I don't understand this. Why would a pending message cause Pluto to
| > become a Zombie? This should not affect the ability of a parent
| > process (init or otherwise) to reap it.
| > | This patch makes sure that we do not open the ctl
| > | socket until we actually can receive messages (all pluto initialization
| > | is done).
| > The code has a comment explaining why the socket is created where it
| > is. I no longer am certain, but I think the idea was that once a
| > startup script had executed "pluto", it was safe to assume that the
| > socket was created. After your change, this is no longer true.
| Yes, I noticed that comment, but I could not see the real reason for
| this, the socket will be useless until pluto setup is compeleted. It is
| not possible to communicate with pluto using whack until this is done.
It is possible to *start* communicating with Pluto. The communication
won't complete until Pluto setup is complete. But, logically, this is
quite reasonable. This property was required for the correctness of the
scripts we used in my day.
| I use an (slow) embedded system (arm, 400mhz), on which there is also
| another benefit of this patch; I use 'ipsec whack --status' to see if
| the tunnels have come up, but if you do this too soon after boot, 'ipsec
| whack' will not return until pluto has started (and processed the
| message, this takes 4-5 seconds), with this patch ipsec whack will
| return immediately if the socket does not exist.
I suggest that if this functionality is useful that it be implemented
another way. For example, perhaps you could look in /proc.
More information about the Dev