[Openswan dev] Problem and proposed bug fix to Openswan when the system's time jumps forward multiple decades
D. Hugh Redelmeier
hugh at mimosa.com
Thu Aug 4 02:37:17 EDT 2011
| From: Antony Richards <arichards at cybertec.com.au>
| I came across a bug in Openswan. The box starts up. Openswan connects. The
| date is set back to 1970, NTPD is started and sets the date to be 2011.
That sounds like a bug in ntpd. How does the date get set to 1970?
My recollection (unverified) that ntpd is supposed to use adjtime(3)
and thus the time is monotonically increasing.
| After this, the command "ipsec auto --status" blocks indefinitely.
Did pluto log the message "time moved backwards %ld seconds" (from
now())?
Pluto's now() tries to keep time monotonic. The method is to use the
variable "delta" to accumulate the amount of backward time observed
and to add "delta" to each subsequent result. It was expected that
delta would be small.
If the time goes backwards a whole bunch, and then formward a similar
amount, I guess that adding delta could cause overflow.
(32-bit time_t should be able to hold 136 years worth of time. It is
signed, and 0 represents Jan 1, 1970. So the representation wraps
arround (overflows) in 2038. If time flies backwards 40 years, and
then forwards 40 years, the correction mechanism will add 40 years to
the current time (2011) and thus overflow.)
now() is called frequently. I imagine that there is a bound on how
long it could have been since the last call. Proposed change: If the
observed time change is way too large in the forward direction, use
that to trim the value of delta.
So: (currently) if time flies 40 years into the past, then delta += 40 years.
(proposed) If time flies 40 years into the future, delta -= 40 years.
It would be good for now() to check if n + delta results in an
overflow. What to do then??? passert failure?
| Analysis (based on 2.4 and verified in 2.6) is that in timer.c:next_event(),
| the expression *evlist->ev_time - now()* is negative (ev_time is 40 years
| ago). Hence next_event() returns 0, causing the event to be triggered.
|
| But when handle_timer_event() is called, the expression *now() < ev->ev_time*
| evaluates to TRUE (due to long wrapping???), meaning that the event is not
| removed from the event list.
|
| Looking at /server.c/, the code becomes stuck in the loop ... in
| /call_server()/ next_event() returns 0. This then sets ndes to 0 (and
| osw_select() is NOT called). But when handle_timer_event() is called, the
| function believes that there are no events to handle, so the event queue does
| not change. So when call_server() loops it calls next_event() which returns 0
| again.
|
| This blocks the thread from calling the select() function to handle all the
| file descriptors. Hence *ipsec auto --status* never gets a reply to its
| request, and will block forever. Likewise, all communications into Openswan
| never gets a reply.
|
| A proposed solution (that I have tested successfully in version 2.4) is to
| change now() to use the systems uptime instead. This solution also has the
| benefit that timing jitter is not introduced by people setting the time on a
| device or programs such as ntpd.
What is the cost of a sysinfo call? now() is called a lot.
sysinfo(2) is Linux-specific according to the manpage.
The semantics of clock_gettime(3) using CLOCK_MONOTONIC might be
better. The tv_sec field of the resulting timespec should do (it's a
time_t).
-------------- next part --------------
_______________________________________________
Dev mailing list
Dev at openswan.org
http://lists.openswan.org/mailman/listinfo/dev
More information about the Dev
mailing list