<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi,<br>
<br>
See inline.<br>
<br>
<br>
On 08/04/2011 04:37 PM, D. Hugh Redelmeier wrote:
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">| From: Antony Richards <a class="moz-txt-link-rfc2396E" href="mailto:arichards@cybertec.com.au"><arichards@cybertec.com.au></a>
| I came across a bug in Openswan. The box starts up. Openswan connects. The
| date is set back to 1970, NTPD is started and sets the date to be 2011.
That sounds like a bug in ntpd. How does the date get set to 1970?
</pre>
</blockquote>
I explicitly turned off ntpd and set the date back to 1970. The reason
was that it quickly reproduced an issue I observed when an Linux
embedded device I'm developing is first powered on with a pre-loaded
configuration.<br>
<br>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">My recollection (unverified) that ntpd is supposed to use adjtime(3)
and thus the time is monotonically increasing.
</pre>
</blockquote>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">
| After this, the command "ipsec auto --status" blocks indefinitely.
Did pluto log the message "time moved backwards %ld seconds" (from
now())?
</pre>
</blockquote>
Yes.<br>
<br>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">
Pluto's now() tries to keep time monotonic. The method is to use the
variable "delta" to accumulate the amount of backward time observed
and to add "delta" to each subsequent result. It was expected that
delta would be small.
If the time goes backwards a whole bunch, and then formward a similar
amount, I guess that adding delta could cause overflow.
(32-bit time_t should be able to hold 136 years worth of time. It is
signed, and 0 represents Jan 1, 1970. So the representation wraps
arround (overflows) in 2038. If time flies backwards 40 years, and
then forwards 40 years, the correction mechanism will add 40 years to
the current time (2011) and thus overflow.)
now() is called frequently. I imagine that there is a bound on how
long it could have been since the last call. Proposed change: If the
observed time change is way too large in the forward direction, use
that to trim the value of delta.
So: (currently) if time flies 40 years into the past, then delta += 40 years.
(proposed) If time flies 40 years into the future, delta -= 40 years.
</pre>
</blockquote>
That would solve the problem also, but using uptime gives a simpler
solution.<br>
What would the minimum jump in time be before adjusting delta?<br>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">
It would be good for now() to check if n + delta results in an
overflow. What to do then??? passert failure?
</pre>
</blockquote>
If all references in timer.c use changes in time then (and the changes
are small, eg less than 1 year) then the overflow can be safely handled
by doing unsigned differences. Casting the result to a signed number
will be correct for before, now or after.<br>
<br>
eg<br>
unsigned int now;<br>
unsigned int scheduled;<br>
<br>
if (((int)(scheduled - now)) < 0)<br>
{<br>
act on the event.<br>
}<br>
<br>
I believe this works for signed numbers also (I just did a paper
exercise to prove it to myself).<br>
<br>
So next_event() and handle_timer_event() need to handle time this way.
This should be done irrespective of a change to now().<br>
<br>
I can create a new diff with this change in it - be about a day or so
(I'll ensure it works with timer overflow).<br>
<br>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">| Analysis (based on 2.4 and verified in 2.6) is that in timer.c:next_event(),
| the expression *evlist->ev_time - now()* is negative (ev_time is 40 years
| ago). Hence next_event() returns 0, causing the event to be triggered.
|
| But when handle_timer_event() is called, the expression *now() < ev->ev_time*
| evaluates to TRUE (due to long wrapping???), meaning that the event is not
| removed from the event list.
|
| Looking at /server.c/, the code becomes stuck in the loop ... in
| /call_server()/ next_event() returns 0. This then sets ndes to 0 (and
| osw_select() is NOT called). But when handle_timer_event() is called, the
| function believes that there are no events to handle, so the event queue does
| not change. So when call_server() loops it calls next_event() which returns 0
| again.
|
| This blocks the thread from calling the select() function to handle all the
| file descriptors. Hence *ipsec auto --status* never gets a reply to its
| request, and will block forever. Likewise, all communications into Openswan
| never gets a reply.
|
| A proposed solution (that I have tested successfully in version 2.4) is to
| change now() to use the systems uptime instead. This solution also has the
| benefit that timing jitter is not introduced by people setting the time on a
| device or programs such as ntpd.
What is the cost of a sysinfo call? now() is called a lot.
</pre>
</blockquote>
Behinds the scene the timer is just a counter that would be copied
over. The man page shows that about 14 counters/variables would need
to be copied per call.<br>
<br>
The kernel function do_sysinfo() does seem to have some overhead.<br>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">
sysinfo(2) is Linux-specific according to the manpage.
The semantics of clock_gettime(3) using CLOCK_MONOTONIC might be
better. The tv_sec field of the resulting timespec should do (it's a
time_t).</pre>
</blockquote>
<br>
Agree. Any timer that that cannot be set works. Looking into the
kernel code, it looks like less overhead than do_sysinfo()<br>
<br>
I'll send a new diff file soon covering these comments (ie changing the
maths in timer.c and using sys_clock_gettime()).<br>
<br>
Thanks,<br>
Antony.<br>
<br>
<br>
<br>
<blockquote
cite="mid:alpine.LRH.2.02.1108040155010.14445@redclaw.mimosa.com"
type="cite">
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Dev@openswan.org">Dev@openswan.org</a>
<a class="moz-txt-link-freetext" href="http://lists.openswan.org/mailman/listinfo/dev">http://lists.openswan.org/mailman/listinfo/dev</a>
</pre>
</blockquote>
<br>
</body>
</html>