httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexei Kosut <ako...@organic.com>
Subject filter API spec
Date Wed, 27 Aug 1997 21:39:30 GMT
Ben asked to see this, others may be interested as well:

This isn't formalized in any way, but here's a document describing the API
for and BUFF-based implementation of a filter API for Apache that Ed and I
have been talking about. It's a draft, and I've probably spelled things
wrong, but if anyone's curious, here it is:

--

Additions to the BUFF API:

>From a user's point of view stacking/unstacking onto a BUFF:

typedef struct {
    void *cookie;
    int (*read)(BUFF *, void *, char *, int);	/* Read data */
    int (*write)(BUFF *, void *, char *, int);	/* Write data */
    void (*flush)(BUFF *, void *);	/* Flush and pass on */
    void (*close)(BUFF *, void *);	/* Finish up, pass on bclose() */
    void (*unattach)(BUFF *, void *);	/* Finish up, don't pass on */
} bfilter;


Public functions in buff.c:

int bpush(BUFF **, bfilter);

(Note: maybe bpushfilter() would be better? Whatever; names are
not as important) This function adds the filter functions from bfilter,
stacking them on top of the BUFF. It puts the new BUFF into the pointered
pointer (that's not a word, is it?)

int bpop(BUFF **)

Unattaches the top-most filter from the stack. Puts the next BUFF down
into the pointered pointer.


A Simple Write Filter:

int my_write(BUFF *b, void *cookie, char *c, len l) {
    /* This filter makes all characters upper-case */
    toupper(c);

    /* This is simple; it could also call bputs() or bprintf()
     * or *whatever*, as long as it adds up the return values
     * and returns the bytes sent (which can be radically
     * different from l, since additional filters may add more
     * characters, or the stream may have been cut off)
     */
    return bwrite(b, c, l);
}

Flush and close would look similar. Unattach would, too, except it
wouldn't call anything on the BUFF. Bread would call bread(b), but it
would do it at the top, and then do something with what it
received. Internally to BUFF:

The BUFF structure would be modified to look sorta like this (plus all
the extra BUFF junk):

BUFF {
    BUFF *down;	/* next BUFF down in the stack, NULL if we're at the bottom */
    BUFF *up;	/* the next one up, NULL if we're at the top */

    bfilters bf;	/* The filters for this BUFF */
    int fd;	/* The file descriptor, -1 for all but the bottom BUFF */
    int fd_in;	/* ditto */
}

The internal write()-like function (buff_write()) would look something
like this:

int buff_write(BUFF *b, char *c, int l) {
    /* Are we a stacked filter? */
    if (b->down) {
	if (b->bf.write)
	    return bf.write(b->down, bf.cookie, c, l);
	else
	    return buff_write(b->down, c, l);
    }

    /* We're the bottom of the stack:
     * do what the current buff_write() does right now
     */
}

The internal bclose, bflush, etc... would be similar. The internal bread()
would have to be backwards (finding the bottom BUFF, and looking at
previous elements - this is why an up element in BUFF is necessary),
since reading happens bottom-up, wheras everything else is top-down.

What this means, of course, is that each filter has its own input and
output buffers. This is a good thing, because it means that a given
filter can't do one-character reads/writes and slow everything down.

So you need to deal with these buffers in other functions:

bflush() looks sorta like this right now (pseudo-code):

bflush {
    if (there are more than 0 bytes in the output buffer)
        buff_write(all those bytes);
        set the number of bytes in the buffer to 0;

    return;
}

You don't want to remove that if there are filters, because the filters
don't know what the internals to BUFF are - they'd have to read from the
output buffer themselves. Instead, what you want to do is just append:

bflush {
    if (there are more than 0 bytes in the output buffer)
        buff_write(all those bytes);
        set the number of bytes in the buffer to 0;

    if (b->next) {
	if (b.bf->flush)
            b.bf->flush(b->down);
	else
	    bflush(b->down);
    }
    else
        return;
}

(Note that the buff_write() will call your filter's write primitive if it
exists, so it all works well)

You don't have to worry about this in bclose(), because it calls bflush()
directly (although you have to do something similar wrt calling each
close function, then on the last one, actually closing the
fd/socket). However, you do have to worry about this in bpop(). So it
does the same thing as bflush() - if there's data pending, call write. (it
can't call bflush() directly because that would send a bflush() down the
stack, and you don't want that if you're just popping one filter).

Basically, unless the filter does something special, its flush, close and
unattach functions look like this:

void my_flush (BUFF *b, void *) {
    bflush(b);
}

void my_close (BUFF *b, void *) {
    bclose(b);
}

void my_unattach (BUFF *b, void *) {
    /* noop */
}

We can allow these to be defined NULL in bfilter, and they're given those
defenitions if so.

The problem is the read buffer. If a filter is popped, and there is
still data in that filter's input buffer, what do you do with it? You
can't "send" it to the application, because you can only do that when
bread() is called. You can't put it into the next filter down's buffer,
because then it would be processed again at the next read. You could set
up a seperate buffer in that filter for "already filtered" data, but that
is a lot of work, and has some major problems.

The simple solution is probably to not actually remove the filter, but
replace its functions with functions that just call the b* functions (so
they just pass the read up) - actually setting them to NULL, given the
implementation allows that - so the buffer remains intact. It's a little
extra processing on the stream, but not much, and it shouldn't
happen most of the time (90% of the time, if you push a filter that does
reading, it's going to stick around until bclose. Most filters that will
be popped are write-only).


Chunking:

Because we want this to happen inside the BUFF code using writev() if
there are no transport-layer filters (which would want to filter chunks
as well), send_http_headers should use logic like this:

send_http_header() {
    /* Send HTTP headers */
    ...
    
    /* If we want to chunk, check for any installed filters on the BUFF
     * bcheckfilters() returns non-zero if there are any filters
     * currently stacked
     */
    if (r->chunked) {
	if (!bcheckfilters(r->connection->client)
            bsetopt(b, B_CHUNK);
	else
            rattach(r, chunking_filter);
    }

    /* New API phase to add any entity filters - see below */
    run_post_body_phase();
}

chunking_filter would contain a write function like this:

chunk_write (BUFF *b, void *, char *c, int l) {
    bprintf(b, "%x\r\n", l);
    bwrite(b, c, l);
    bwrite(b, "\r\n", 2);
}

chunking_filter's unattach/close function could send the end footer, but
it's probably better to not send anything, and let finalize_request_protocol
send what it sends now, since it could also send footers (although it
doesn't right now). It'd be pretty simple to write a module that used a
filter to compute a Content-MD5 footer for every chunked response, for
example.

Anyhow, by still setting B_CHUNK, you make use of the current efficient
chunk code using writev and the like. You only use the inefficient
code if there's a filter stacked under where chunking would be (e.g.,
SSL) - and most entity-filters (e.g., server-side includes) would be on
top of chunking, so writev would still be used, which is good.


New request API phases:

(Note that there are a number of useful API phases, these are just the
two you need to add to implement the filtering API):

connection_open:

To be called right after the connection opens, before any requests come
in. This could be used to add transport-level filters, e.g. SSL. A
connection_close isn't needed, because bclose() will call the filters'
close functions.

pre_body:

Called from send_http_headers(), after all the headers, before any
body. This can be used to call rattach(), which can attach "entity
filters", which are automatically popped off (in
finalize_request_protocol()).

Actually, it's more complex than that. Here's how the Apache API could
deal with this:

You can add filters in pre_body, if you want, using rattach(), which
looks sorta like this:

void rattach(request_rec *r, bfilter bf) {
    bpushfilter(&r->connection->client, bf);
    r->filters++;
}

Then finalize_request_protocol has something like this added:

    while (r->filters--)
	bpopfilter(&r->connection->client);

However, there's also a need for filters to be added by other modules
than the one they're defined in. For example, let's say server-side
includes are a filter, so is PHP/FI. I'd want to be able to do this:

AddFilter ssi-filter html
AddFilter php-filter html

And have HTML files parsed by both SSI and PHP, say. Additionally, CGI
scripts might be able to output a Set-Filter: header that tells Apache to
pass their output through a filter, or more than one.

So you need named filters. The best way to do this is probably to emulate
how we do handlers: You have a name/function pair, possibly added to the
current handler list, possibly with a new entry in the module structure
(the former is simpler, but more likely to be screwed up by a user
putting a real handler in AddFilter or a filter in AddHandler). It looks
sorta like this:

filter_rec my_filters[] = {
{ "my-filter", my_filter },
{ NULL }
}

int my_filter (request_rec *r) {
    bfilter bf = /* Whatever */;
    return rattach(r, bf);
}

Then you could have an raddfilterbyname(request_rec *, char *) function
that mod_mime could call from pre_body based on its AddFilter function,
and mod_cgi could call if it parsed a Set-Filter: header, etc... This
seems the best way of doing all this.

That's all, folks.

-- Alexei Kosut <akosut@organic.com>


Mime
View raw message