httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject Re: filter API spec
Date Tue, 02 Sep 1997 07:17:08 GMT
I haven't read this completely yet, but it looks like you're missing
writev() -- it's needed for performance (avoiding user-to-user copies).
Without it you can't implement chunking as a filter as efficiently as I
presently have.

write() functions must take const void *, they should not modify data --
consider mmap()d files. 

Multiple layers of buffers is dangerous, but this is really an
implementation detail in each filter.  Filters should be able to
completely disable buffering. 

I'll read more later.

Dean

On Wed, 27 Aug 1997, Alexei Kosut wrote:

> Ben asked to see this, others may be interested as well:
> 
> This isn't formalized in any way, but here's a document describing the API
> for and BUFF-based implementation of a filter API for Apache that Ed and I
> have been talking about. It's a draft, and I've probably spelled things
> wrong, but if anyone's curious, here it is:
> 
> --
> 
> Additions to the BUFF API:
> 
> >From a user's point of view stacking/unstacking onto a BUFF:
> 
> typedef struct {
>     void *cookie;
>     int (*read)(BUFF *, void *, char *, int);	/* Read data */
>     int (*write)(BUFF *, void *, char *, int);	/* Write data */
>     void (*flush)(BUFF *, void *);	/* Flush and pass on */
>     void (*close)(BUFF *, void *);	/* Finish up, pass on bclose() */
>     void (*unattach)(BUFF *, void *);	/* Finish up, don't pass on */
> } bfilter;
> 
> 
> Public functions in buff.c:
> 
> int bpush(BUFF **, bfilter);
> 
> (Note: maybe bpushfilter() would be better? Whatever; names are
> not as important) This function adds the filter functions from bfilter,
> stacking them on top of the BUFF. It puts the new BUFF into the pointered
> pointer (that's not a word, is it?)
> 
> int bpop(BUFF **)
> 
> Unattaches the top-most filter from the stack. Puts the next BUFF down
> into the pointered pointer.
> 
> 
> A Simple Write Filter:
> 
> int my_write(BUFF *b, void *cookie, char *c, len l) {
>     /* This filter makes all characters upper-case */
>     toupper(c);
> 
>     /* This is simple; it could also call bputs() or bprintf()
>      * or *whatever*, as long as it adds up the return values
>      * and returns the bytes sent (which can be radically
>      * different from l, since additional filters may add more
>      * characters, or the stream may have been cut off)
>      */
>     return bwrite(b, c, l);
> }
> 
> Flush and close would look similar. Unattach would, too, except it
> wouldn't call anything on the BUFF. Bread would call bread(b), but it
> would do it at the top, and then do something with what it
> received. Internally to BUFF:
> 
> The BUFF structure would be modified to look sorta like this (plus all
> the extra BUFF junk):
> 
> BUFF {
>     BUFF *down;	/* next BUFF down in the stack, NULL if we're at the bottom */
>     BUFF *up;	/* the next one up, NULL if we're at the top */
> 
>     bfilters bf;	/* The filters for this BUFF */
>     int fd;	/* The file descriptor, -1 for all but the bottom BUFF */
>     int fd_in;	/* ditto */
> }
> 
> The internal write()-like function (buff_write()) would look something
> like this:
> 
> int buff_write(BUFF *b, char *c, int l) {
>     /* Are we a stacked filter? */
>     if (b->down) {
> 	if (b->bf.write)
> 	    return bf.write(b->down, bf.cookie, c, l);
> 	else
> 	    return buff_write(b->down, c, l);
>     }
> 
>     /* We're the bottom of the stack:
>      * do what the current buff_write() does right now
>      */
> }
> 
> The internal bclose, bflush, etc... would be similar. The internal bread()
> would have to be backwards (finding the bottom BUFF, and looking at
> previous elements - this is why an up element in BUFF is necessary),
> since reading happens bottom-up, wheras everything else is top-down.
> 
> What this means, of course, is that each filter has its own input and
> output buffers. This is a good thing, because it means that a given
> filter can't do one-character reads/writes and slow everything down.
> 
> So you need to deal with these buffers in other functions:
> 
> bflush() looks sorta like this right now (pseudo-code):
> 
> bflush {
>     if (there are more than 0 bytes in the output buffer)
>         buff_write(all those bytes);
>         set the number of bytes in the buffer to 0;
> 
>     return;
> }
> 
> You don't want to remove that if there are filters, because the filters
> don't know what the internals to BUFF are - they'd have to read from the
> output buffer themselves. Instead, what you want to do is just append:
> 
> bflush {
>     if (there are more than 0 bytes in the output buffer)
>         buff_write(all those bytes);
>         set the number of bytes in the buffer to 0;
> 
>     if (b->next) {
> 	if (b.bf->flush)
>             b.bf->flush(b->down);
> 	else
> 	    bflush(b->down);
>     }
>     else
>         return;
> }
> 
> (Note that the buff_write() will call your filter's write primitive if it
> exists, so it all works well)
> 
> You don't have to worry about this in bclose(), because it calls bflush()
> directly (although you have to do something similar wrt calling each
> close function, then on the last one, actually closing the
> fd/socket). However, you do have to worry about this in bpop(). So it
> does the same thing as bflush() - if there's data pending, call write. (it
> can't call bflush() directly because that would send a bflush() down the
> stack, and you don't want that if you're just popping one filter).
> 
> Basically, unless the filter does something special, its flush, close and
> unattach functions look like this:
> 
> void my_flush (BUFF *b, void *) {
>     bflush(b);
> }
> 
> void my_close (BUFF *b, void *) {
>     bclose(b);
> }
> 
> void my_unattach (BUFF *b, void *) {
>     /* noop */
> }
> 
> We can allow these to be defined NULL in bfilter, and they're given those
> defenitions if so.
> 
> The problem is the read buffer. If a filter is popped, and there is
> still data in that filter's input buffer, what do you do with it? You
> can't "send" it to the application, because you can only do that when
> bread() is called. You can't put it into the next filter down's buffer,
> because then it would be processed again at the next read. You could set
> up a seperate buffer in that filter for "already filtered" data, but that
> is a lot of work, and has some major problems.
> 
> The simple solution is probably to not actually remove the filter, but
> replace its functions with functions that just call the b* functions (so
> they just pass the read up) - actually setting them to NULL, given the
> implementation allows that - so the buffer remains intact. It's a little
> extra processing on the stream, but not much, and it shouldn't
> happen most of the time (90% of the time, if you push a filter that does
> reading, it's going to stick around until bclose. Most filters that will
> be popped are write-only).
> 
> 
> Chunking:
> 
> Because we want this to happen inside the BUFF code using writev() if
> there are no transport-layer filters (which would want to filter chunks
> as well), send_http_headers should use logic like this:
> 
> send_http_header() {
>     /* Send HTTP headers */
>     ...
>     
>     /* If we want to chunk, check for any installed filters on the BUFF
>      * bcheckfilters() returns non-zero if there are any filters
>      * currently stacked
>      */
>     if (r->chunked) {
> 	if (!bcheckfilters(r->connection->client)
>             bsetopt(b, B_CHUNK);
> 	else
>             rattach(r, chunking_filter);
>     }
> 
>     /* New API phase to add any entity filters - see below */
>     run_post_body_phase();
> }
> 
> chunking_filter would contain a write function like this:
> 
> chunk_write (BUFF *b, void *, char *c, int l) {
>     bprintf(b, "%x\r\n", l);
>     bwrite(b, c, l);
>     bwrite(b, "\r\n", 2);
> }
> 
> chunking_filter's unattach/close function could send the end footer, but
> it's probably better to not send anything, and let finalize_request_protocol
> send what it sends now, since it could also send footers (although it
> doesn't right now). It'd be pretty simple to write a module that used a
> filter to compute a Content-MD5 footer for every chunked response, for
> example.
> 
> Anyhow, by still setting B_CHUNK, you make use of the current efficient
> chunk code using writev and the like. You only use the inefficient
> code if there's a filter stacked under where chunking would be (e.g.,
> SSL) - and most entity-filters (e.g., server-side includes) would be on
> top of chunking, so writev would still be used, which is good.
> 
> 
> New request API phases:
> 
> (Note that there are a number of useful API phases, these are just the
> two you need to add to implement the filtering API):
> 
> connection_open:
> 
> To be called right after the connection opens, before any requests come
> in. This could be used to add transport-level filters, e.g. SSL. A
> connection_close isn't needed, because bclose() will call the filters'
> close functions.
> 
> pre_body:
> 
> Called from send_http_headers(), after all the headers, before any
> body. This can be used to call rattach(), which can attach "entity
> filters", which are automatically popped off (in
> finalize_request_protocol()).
> 
> Actually, it's more complex than that. Here's how the Apache API could
> deal with this:
> 
> You can add filters in pre_body, if you want, using rattach(), which
> looks sorta like this:
> 
> void rattach(request_rec *r, bfilter bf) {
>     bpushfilter(&r->connection->client, bf);
>     r->filters++;
> }
> 
> Then finalize_request_protocol has something like this added:
> 
>     while (r->filters--)
> 	bpopfilter(&r->connection->client);
> 
> However, there's also a need for filters to be added by other modules
> than the one they're defined in. For example, let's say server-side
> includes are a filter, so is PHP/FI. I'd want to be able to do this:
> 
> AddFilter ssi-filter html
> AddFilter php-filter html
> 
> And have HTML files parsed by both SSI and PHP, say. Additionally, CGI
> scripts might be able to output a Set-Filter: header that tells Apache to
> pass their output through a filter, or more than one.
> 
> So you need named filters. The best way to do this is probably to emulate
> how we do handlers: You have a name/function pair, possibly added to the
> current handler list, possibly with a new entry in the module structure
> (the former is simpler, but more likely to be screwed up by a user
> putting a real handler in AddFilter or a filter in AddHandler). It looks
> sorta like this:
> 
> filter_rec my_filters[] = {
> { "my-filter", my_filter },
> { NULL }
> }
> 
> int my_filter (request_rec *r) {
>     bfilter bf = /* Whatever */;
>     return rattach(r, bf);
> }
> 
> Then you could have an raddfilterbyname(request_rec *, char *) function
> that mod_mime could call from pre_body based on its AddFilter function,
> and mod_cgi could call if it parsed a Set-Filter: header, etc... This
> seems the best way of doing all this.
> 
> That's all, folks.
> 
> -- Alexei Kosut <akosut@organic.com>
> 
> 


Mime
View raw message