[Openswan dev] Problem and proposed bug fix to Openswan when the system's time jumps forward multiple decades
arichards at cybertec.com.au
Thu Aug 4 03:44:45 EDT 2011
On 08/04/2011 04:37 PM, D. Hugh Redelmeier wrote:
> | From: Antony Richards<arichards at cybertec.com.au>
> | I came across a bug in Openswan. The box starts up. Openswan connects. The
> | date is set back to 1970, NTPD is started and sets the date to be 2011.
> That sounds like a bug in ntpd. How does the date get set to 1970?
I explicitly turned off ntpd and set the date back to 1970. The reason
was that it quickly reproduced an issue I observed when an Linux
embedded device I'm developing is first powered on with a pre-loaded
> My recollection (unverified) that ntpd is supposed to use adjtime(3)
> and thus the time is monotonically increasing.
> | After this, the command "ipsec auto --status" blocks indefinitely.
> Did pluto log the message "time moved backwards %ld seconds" (from
> Pluto's now() tries to keep time monotonic. The method is to use the
> variable "delta" to accumulate the amount of backward time observed
> and to add "delta" to each subsequent result. It was expected that
> delta would be small.
> If the time goes backwards a whole bunch, and then formward a similar
> amount, I guess that adding delta could cause overflow.
> (32-bit time_t should be able to hold 136 years worth of time. It is
> signed, and 0 represents Jan 1, 1970. So the representation wraps
> arround (overflows) in 2038. If time flies backwards 40 years, and
> then forwards 40 years, the correction mechanism will add 40 years to
> the current time (2011) and thus overflow.)
> now() is called frequently. I imagine that there is a bound on how
> long it could have been since the last call. Proposed change: If the
> observed time change is way too large in the forward direction, use
> that to trim the value of delta.
> So: (currently) if time flies 40 years into the past, then delta += 40 years.
> (proposed) If time flies 40 years into the future, delta -= 40 years.
That would solve the problem also, but using uptime gives a simpler
What would the minimum jump in time be before adjusting delta?
> It would be good for now() to check if n + delta results in an
> overflow. What to do then??? passert failure?
If all references in timer.c use changes in time then (and the changes
are small, eg less than 1 year) then the overflow can be safely handled
by doing unsigned differences. Casting the result to a signed number
will be correct for before, now or after.
unsigned int now;
unsigned int scheduled;
if (((int)(scheduled - now)) < 0)
act on the event.
I believe this works for signed numbers also (I just did a paper
exercise to prove it to myself).
So next_event() and handle_timer_event() need to handle time this way.
This should be done irrespective of a change to now().
I can create a new diff with this change in it - be about a day or so
(I'll ensure it works with timer overflow).
> | Analysis (based on 2.4 and verified in 2.6) is that in timer.c:next_event(),
> | the expression *evlist->ev_time - now()* is negative (ev_time is 40 years
> | ago). Hence next_event() returns 0, causing the event to be triggered.
> | But when handle_timer_event() is called, the expression *now()< ev->ev_time*
> | evaluates to TRUE (due to long wrapping???), meaning that the event is not
> | removed from the event list.
> | Looking at /server.c/, the code becomes stuck in the loop ... in
> | /call_server()/ next_event() returns 0. This then sets ndes to 0 (and
> | osw_select() is NOT called). But when handle_timer_event() is called, the
> | function believes that there are no events to handle, so the event queue does
> | not change. So when call_server() loops it calls next_event() which returns 0
> | again.
> | This blocks the thread from calling the select() function to handle all the
> | file descriptors. Hence *ipsec auto --status* never gets a reply to its
> | request, and will block forever. Likewise, all communications into Openswan
> | never gets a reply.
> | A proposed solution (that I have tested successfully in version 2.4) is to
> | change now() to use the systems uptime instead. This solution also has the
> | benefit that timing jitter is not introduced by people setting the time on a
> | device or programs such as ntpd.
> What is the cost of a sysinfo call? now() is called a lot.
Behinds the scene the timer is just a counter that would be copied
over. The man page shows that about 14 counters/variables would need to
be copied per call.
The kernel function do_sysinfo() does seem to have some overhead.
> sysinfo(2) is Linux-specific according to the manpage.
> The semantics of clock_gettime(3) using CLOCK_MONOTONIC might be
> better. The tv_sec field of the resulting timespec should do (it's a
Agree. Any timer that that cannot be set works. Looking into the
kernel code, it looks like less overhead than do_sysinfo()
I'll send a new diff file soon covering these comments (ie changing the
maths in timer.c and using sys_clock_gettime()).
> Dev mailing list
> Dev at openswan.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Dev