Return-Path: Delivered-To: new-httpd-archive@hyperreal.org Received: (qmail 20183 invoked by uid 6000); 29 Apr 1998 04:08:13 -0000 Received: (qmail 20165 invoked from network); 29 Apr 1998 04:08:11 -0000 Received: from twinlark.arctic.org (204.62.130.91) by taz.hyperreal.org with SMTP; 29 Apr 1998 04:08:11 -0000 Received: (qmail 17801 invoked by uid 500); 29 Apr 1998 04:08:11 -0000 Date: Tue, 28 Apr 1998 21:08:10 -0700 (PDT) From: Dean Gaudet To: new-httpd@apache.org Subject: Re: NSPR (was Re: rewritelog inefficiency) In-Reply-To: <3546578F.2CB32B71@netscape.com> Message-ID: X-Comment: Visit http://www.arctic.org/~dgaudet/legal for information regarding copyright and disclaimer. MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: new-httpd-owner@apache.org Precedence: bulk Reply-To: new-httpd@apache.org On Tue, 28 Apr 1998, Wan-Teh Chang wrote: > In NSPR, every fd that represents a socket or pipe on Unix is put innonblocking > mode (O_NONBLOCK flag is on). Oh wow. Even when you're using pthreads only? This seems inefficient. When you're using pthreads supplied by the kernel/libc then this sort of multiplexing is all done for you behind the scenes. Granted, that too can be slow because of too many contexts -- and that's why the MxN model is so interesting. But I'm confused why you'd be doing this in the 1-1 model where each NSPR thread is a kernel thread. (I'm not claiming to be an expert here :) > If a PRFileDesc is in blocking mode, which is the default (note: this blocking > modeis at the NSPR level, not at the Unix level. The Unix fd is always > nonblocking.), > PR_Write() does not return until all the data is written. So it may have to > make > several write() system calls, as follows: > write() /* get EAGAIN, or a byte count less than you requested */ > poll() > write() /* get EAGAIN, or a byte count less than you requested */ > poll() > .... > write() /* finally the entire user buffer has been transmitted */ > return > > You can see this write-poll loop in FileWrite() in prio.c and SocketWrite() in > prsocket.c and pt_Write() and pt_write_cont in ptio.c. Ok I see the loop in FileWrite() (prfile.c) -- but I don't see the poll()... If I dig down into _MD_write in src/md/unix/unix.c I see that when it's a native thread it will use select(). I'm guessing that it's done this way so that you can share as much code as possible between implementations. So far I haven't seen anything which requires it to be this way... which is cool, 'cause it means there's room for improvement :) Something I happen to enjoy! > So, even if the write() system call is atomic (are you sure this > is true, by the way?), PR_Write() may not be atomic if a write() call > only writes part of the buffer. The Single Unix Reference guarantees writes on pipes/FIFOs (blocking or non-blocking) of size <= PIPE_BUF are atomic. Larger writes can be broken on arbitrary boundaries. It makes a similar guarantee on STREAMS, but the actual size depends on the STREAM. I can't find the reference right now, but there's an ambiguity somewhere which made it seem to me that all writes on files are atomic (which is absurd). It makes no explicit guarantees for sockets... but it actually depends on how you interpret the semantics of send(). In practice, I'm confident that all unixes make the atomicity guarantee for writes <= PIPE_BUF. PIPE_BUF is 512 at a minimum, but is more like 4096 on real systems. For example, the current apache assumes this is the case for write()s to the file system, and it's been that way for years, and nobody complains of messed up logs. I can't give a specific example for sockets though which would help prove my claim... but for logs I'm not interested in atomicity of sockets. > > It'll be really unfortunate if we have to add locking for logs at the > > application layer. In a pure pthreads setting, for example, we could > > just write() directly and it would all be taken care of by the kernel -- > > the kernel has to lock anyhow, so it's not needed at the user level. > > You are assuming that every write() will write the entire buffer.Suppose you > want to write 64K to a nonblocking socket, > write() may return 32K. Then this write() is not atomic. Nope I don't make that assumption, I just wasn't clear enough in my description of the problem. In the case of log writes Apache doesn't actually buffer anything by default -- it builds each log entry in a buffer on the stack, and issues a single write() call for it. I did implement a form of atomic buffered logs, it's a compile time option. In this case I use a buffer of size PIPE_BUF, and delay write()s until we come across a log entry which won't fit into the buffer. Then the buffer is flushed (without the new log entry), and the new log entry is put into the empty buffer (or written directly if it's larger than PIPE_BUF). This gives atomic logs with buffering. > In the case of logging, you can probably assume that the data > buffer is less than 16K or 32K so that each write() will be able > to write the entire buffer when the fd is nonblocking. If this > assumption is valid, then PR_Write() will only make one write() > system call, and therefore PR_Write() is also atomic. But I don't > know if you are able to guarantee that all your write() calls are > atomic. Right, it sounds like it's all there on unix. I just want to nail the corner cases of the semantics of things. I'm sure I'll have a bunch more questions/comments along the way :) Dean