httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From (Wan-Teh Chang)
Subject Re: NSPR (was Re: rewritelog inefficiency)
Date Wed, 29 Apr 1998 04:36:42 GMT

Dean Gaudet wrote:

> On Tue, 28 Apr 1998, Wan-Teh Chang wrote:
> > In NSPR, every fd that represents a socket or pipe on Unix is put innonblocking
> > mode (O_NONBLOCK flag is on).
> Oh wow.  Even when you're using pthreads only?  This seems inefficient.
> When you're using pthreads supplied by the kernel/libc then this sort of
> multiplexing is all done for you behind the scenes.  Granted, that too
> can be slow because of too many contexts -- and that's why the MxN model
> is so interesting.  But I'm confused why you'd be doing this in the 1-1
> model where each NSPR thread is a kernel thread.  (I'm not claiming to
> be an expert here :)

This is a valid question to ask.  The reason we put every socket and pipefile
descriptor in nonblocking mode is to implement timeout and interrupt
(i.e., abort) I/O operations that may block.  Take PR_Recv() on a
socket for example:
- Timeout: If the first recv() call fails with EAGAIN, we call poll() with
   the specified timeout value.  This is how we implement timeout on I/O
- Interrupt (by PR_Interrupt): The poll() call actually uses a small timeout
  and wakes up periodically to check the target thread's interrupt flag.
  (Note: this is not the same as being interrupt by a Unix signal.)

An alternative is to leave all the fd's in blocking mode.  Then it is more
difficult, if not impossible, to implement timeout and interrupt.  One possibility
is to use pthread_kill to send a signal to the target thread so that the I/O
system call fails with EINTR.  Implementing timeout this way can be
tricky because the SIGALRM signals generated by setitimer cannot
be directed to a particular thread.  Another drawback is that this will
consume one signal (SIGUSR1 or SIGUSR2) and may conflict with
programs that also use these user signals.

Do you have any suggestions?

Another possibility, at the cost of adding new API functions, is to
specify at the time of creating a file descriptor that we are not interested
in timeout or interrupt on the fd.  Then the underlying Unix fd can be
left in blocking mode and the NSPR I/O functions would just call
the blocking system calls on the fd.  For HTTP, I think you will need
timeout for reads.  However, I think accepts on the listening socket
(port 80) do not need timeout.  Then this would be a possible

> > If a PRFileDesc is in blocking mode, which is the default (note: this blocking
> > modeis at the NSPR level, not at the Unix level.  The Unix fd is always
> > nonblocking.),
> > PR_Write() does not return until all the data is written.  So it may have to
> > make
> > several write() system calls, as follows:
> >     write()    /* get EAGAIN, or a byte count less than you requested */
> >     poll()
> >     write()    /* get EAGAIN, or a byte count less than you requested */
> >     poll()
> >     ....
> >     write()   /* finally the entire user buffer has been transmitted */
> >     return
> >
> > You can see this write-poll loop in FileWrite() in prio.c and SocketWrite() in
> > prsocket.c and pt_Write() and pt_write_cont in ptio.c.
> Ok I see the loop in FileWrite() (prfile.c) -- but I don't see the
> poll()... If I dig down into _MD_write in src/md/unix/unix.c I see that
> when it's a native thread it will use select().

Right, the source code you have still uses select().  We recently changed itto call
poll() when poll() is available to avoid the FD_SETSIZE constraint.
You will get this new code in the next source code drop on

> In practice, I'm confident that all unixes make the atomicity guarantee
> for writes <= PIPE_BUF.  PIPE_BUF is 512 at a minimum, but is more like
> 4096 on real systems.

Does this only apply to writes to pipes, or does it apply to writesto sockets and
normal files too?

>  For example, the current apache assumes this is
> the case for write()s to the file system, and it's been that way for
> years, and nobody complains of messed up logs.  I can't give a specific
> example for sockets though which would help prove my claim... but for
> logs I'm not interested in atomicity of sockets.

I agree that writing to log files is a prime example of multiple threads writing to
thesame file descriptor.

> In the case of log writes Apache doesn't actually buffer anything
> by default -- it builds each log entry in a buffer on the stack, and
> issues a single write() call for it.  I did implement a form of atomic
> buffered logs, it's a compile time option.  In this case I use a buffer
> of size PIPE_BUF, and delay write()s until we come across a log entry
> which won't fit into the buffer.  Then the buffer is flushed (without
> the new log entry), and the new log entry is put into the empty buffer
> (or written directly if it's larger than PIPE_BUF).  This gives atomic
> logs with buffering.

How do you guarantee write atomicity when the log entry is larger thanPIPE_BUF?

> I'm sure I'll have a bunch more questions/comments along the way :)

Feel free to comment and ask questions!  By the way, anotherappropriate forum for
such questions/comments is the
netscape.public.mozilla.general newsgroup at


View raw message