httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: cvs commit: apache-2.0/mpm/src/main iol_unix.c Makefile.tmpl buff.c http_connection.c http_protocol.c http_request.c
Date Sat, 19 Jun 1999 18:56:00 GMT


On Sat, 19 Jun 1999, Ben Laurie wrote:

> Is buffer list so bad? Presumably one can write access functions that
> take most of the pain out of reading from them and creating them, so
> what's the big deal?

I've been using a zero-copy implementation at criticalpath for a few
months now... and it doesn't seem to be a huge win -- because of the
complexity of the code.

So we start by saying we have a "write" function to which we can pass a
buffer, and then that function owns the buffer.  The first thing to
observe is that some buffers are "static", such as static strings, and
mmapped regions; others are dynamic -- say allocated from a 4k page pool.
Now your buffer abstraction needs a deallocation function, and probably a
"void *".  Something like this:

    struct buffer_head {
	char *base;
	size_t len;
	void (*free)(void *user_data, buffer_head *bh);
	void *user_data;
    };

The mmap buffers may actually be in a cache, and at some point we may
decide we need to free part of the cache... we need to know how many
buffer_heads for that part are outstanding.  The easiest way to deal
with this is reference counting.  If we have reference counting in
the buffer_head then we've got a way to duplicate buffer_heads as well.
So stick that in:

    struct buffer_head {
	const char *base;
	size_t len;
	void (*free)(void *user_data, buffer_head *bh);
	void *user_data;
	unsigned ref_count;
    };

OK now we want this all to work in a multithreaded setting... which means
we need to protect at least the buffer_heads from concurrent acesss.  But
not all buffer_heads need concurrent access, only those coming from
shared caches essentially need it.  So maybe this?

    struct buffer_head {
	const char *base;
	size_t len;
	void (*free)(void *user_data, buffer_head *bh);
	void *user_data;
	unsigned ref_count;
	ap_mutex_t *mutex;	/* if non-NULL, locking required ?  */
    };

OK now we want to be able to place these into lists... so we start
with:

    struct buffer_list {
	buffer_list *next;
	buffer_head *bh;
    };

But when we get partial write()s we want to eat up a partial number of
bytes at the beginning... so we'll keep track of the starting offset
into the buffer in buffer_list.  And at this point I'm not so certain,
but I think there's a need to keep track of the end point...  maybe not.
Let's add it for now:

    struct buffer_list {
	buffer_list *next;
	buffer_head *bh;
	unsigned start;		/* first valid byte */
	unsigned end;		/* last valid byte + 1 */
    };
    /* could probably get rid of len in buffer_head if end is here */

And finally we need a header for the buffer_list:

    struct buffer {
	buffer_list *first;
	buffer_list **tail;
    };
    /* initialized with tail = &first... eliminates one special case in
       the append routine */

Note that we'll be allocating and deallocating buffer_lists frequently,
which doesn't sit so well with pools.  Not so bad to solve.

What seems to happen in practice though is that this involves a bunch
more function calling overhead.  And that eats up all the savings from
the zero-copying.  My large_write heuristic in apache-1.x does a really
good job of nailing a bunch of the copies.  In fact, the only case
that I can't currently easily solve for chunking is this:

- write response1 header, unchunked
- write response1, chunked ... still haven't filled 4k buffer
- this is a pipelined connection, we don't want to flush here
- write response2 header, unchunked
- write response2, chunked ... still haven't filled 4k buffer...
- repeat until pipeline is empty or buffer is full

Right now in 1.3 we nail that with a single write (maybe a writev(),
depending on the size of the bwrite which makes it go over 4k finally).

For dynamic content modules which use bwrite only, and we expect large
writes, it's sufficient for me to add a "bwritev" and wrap every write
in a chunk and pass it down.  But php notably uses the bputs/putc/printf
interface and does small writes... so I want to coalesce those before
I pass them down.  Easy to do when there's only 1 response before a
flush.

But I'll admit I don't think there are any clients that actually
pipeline.  And it's only a problem when responses are smaller than
1.25k or so.

Any thoughts?

Dean


Mime
View raw message