httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <>
Subject Re: NSPR (was Re: rewritelog inefficiency)
Date Wed, 29 Apr 1998 05:21:11 GMT
On Wed, 29 Apr 1998, Wan-Teh Chang wrote:

> This is a valid question to ask.  The reason we put every socket and pipefile
> descriptor in nonblocking mode is to implement timeout and interrupt
> (i.e., abort) I/O operations that may block.

Oh that's a good point!

> tricky because the SIGALRM signals generated by setitimer cannot
> be directed to a particular thread.

If all threads except one block SIGALRM then it will be delivered to the
single thread that has it unblocked.

> Another drawback is that this will
> consume one signal (SIGUSR1 or SIGUSR2) and may conflict with
> programs that also use these user signals.

Yeah, this is where posix realtime signals are really nice.  (side note:
posix realtime signals will be in linux 2.2, and this is when linuxthreads
will stop using SIGUSR1/SIGUSR2... and then NSPR w/USE_PTHREADS should

> Do you have any suggestions?

Yup, here's something similar to what is currently in apache-1.3, and
which I was planning on doing in the pthreads port (until I gave it up and
started using NSPR).  The general idea is that timeouts don't happen
frequently, so you want to make sure that you don't spend many cycles
worrying about them.  But... I'm not happy with the ability of this method
to handle really fine grained timeouts.  So it's probably not enough.

typedef struct {
    spinlock_t lock;
    timer_interval_t epoch;
    timer_interval_t timeout_len;
    pthread_t tid;
    unsigned last_vtime;
    unsigned vtime;
} thread_timer_data_t;

Every thread has one of those.  For convenience they're in an array, index
assigned at thread creation time.

When a thread wishes to set or modify a timeout, it acquires the lock,
sets the timeout_len to the length of the desired timeout, increments
vtime, and releases the lock.

A timeout manager thread awakes at specific intervals (in apache-1.3 it's once
per second by default), and scans this array.  It does something like this:

    for (i = 0; i < max_thread_index; ++i) {
	thread_timer_data_t *t = &times[i];
	if (t->timeout_len) {
	    if (t->last_vtime != t->vtime) {
		t->epoch = now;
		t->last_vtime = t->vtime;
	    else if ((timer_interval_t)(now - t->epoch) > t->timeout_len) {
		pthread_kill(t->tid, SIGALRM); /* or whatever signal */

The vtime is essentially a progress indicator, the thread can increment it
using an atomic_inc() without grabbing the lock and things still work.  As
long as the vtime keeps changing the manager considers the thread to be
busy and not ready to be alarmed.  When the vtime doesn't make progress,
the thread will eventually be signalled.

Timeouts can last up to 2 intervals longer than specified.  Scanning the
table once per second isn't terribly expensive; but it *is* noticeable,
I've seen a few extra % hits/s by increasing the interval to 5s.  But it's
worth it (for apache-1.3) even at 1s.  There's a bunch of obvious
optimizations to improve L1 locality, and such... these depend on the
exact nature of the system it's running on (i.e. single cpu, multiple
cpus, how sticky threads/processes are, ...)

It just doesn't work for very fine grained timers.  But a hybrid solution
may be neat to play with some day.

> Another possibility, at the cost of adding new API functions, is to
> specify at the time of creating a file descriptor that we are not interested
> in timeout or interrupt on the fd.

Yeah this would be nice -- we could use it on the filesystem descriptors

> Right, the source code you have still uses select().  We recently changed itto call
> poll() when poll() is available to avoid the FD_SETSIZE constraint.
> You will get this new code in the next source code drop on


> > In practice, I'm confident that all unixes make the atomicity guarantee
> > for writes <= PIPE_BUF.  PIPE_BUF is 512 at a minimum, but is more like
> > 4096 on real systems.
> Does this only apply to writes to pipes, or does it apply to writesto sockets and
> normal files too?

No, I can't find anywhere in the single unix spec that guarantees it.  But I'm
not aware of an implementation that doesn't have some form of atomicity for
small enough writes.  I'll ask around and see if anyone knows more.

> How do you guarantee write atomicity when the log entry is larger thanPIPE_BUF?

I don't.  I only guarantee it for log entries smaller than
PIPE_BUF... above that folks will lose.  There is a way of doing it such
that a post-processor can reassemble the log correctly.  But it just
doesn't seem to be worth it... if log entries go over about 4k bytes
folks will lose...

> Feel free to comment and ask questions!  By the way, anotherappropriate forum for
> such questions/comments is the
> netscape.public.mozilla.general newsgroup at

Yeah I subscribed yesterday and I'm lurking now.  I was afraid there
would be too much other traffic.


View raw message