httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r..@covalent.net
Subject Re: [PATCH] dechunking, body filtering
Date Mon, 16 Oct 2000 19:06:27 GMT

Well, I promised to do a more thorough review a few days ago, but I
forgot, here it is.

>   r
>  
>     Add "void *filtered_input" field; ap_get_client_block() holds onto 
>     leftover body in this field.  Ugly that we clutter the r...  Ugly that
>     I made it "void *" instead of "ap_bucket_brigade *" (avoids an additional
>     #include in "httpd.h" for (probably) no good reason).

Mentioned in another note, but repeating it here for completeness.   This
belongs in core_dir_conf, not the request_rec.  This is information only
for a core function, so it shouldn't be available to everything.  Also,
this really should be a bucket_brigade *, but that's REALLY minor.

>   remember that the read_policy parameter is copied to r->read_body;
> 
>   if read_policy is REQUEST_CHUNKED_PASS and the transport encoding is
>   chunked...
>  
>     Disable filtering of the request body, because chunk headers would
>     be invalid
>     if the filter changed the length.

I'm not sure I like this, but I dislike the other options more.  The
problem here, is that we are changing how a request is received based on
information in the filter.  Then again, we only use REQUEST_CHUNKED_PASS
when we are discarding the whole body anyway.  I have no good answer, but
the solution I hate more than disabling filtering, is for the chunk filter
to fix the chunk headers.  The solution I like a bit more, is to remove
this option altogether, although that too is a bit bogus.  Just some food
for thought on this.

>       r->input_filters = r->connection->input_filters;
>  
>     dechunk_filter() will look at the read policy to know whether or not to leave the
>     chunk headers.
> 
> core's register_hooks()
> 
>     register the dechunk filter; play games with filter type so that it is
>     added "properly"

BTW, I hate this, but that is a problem with our filter registration that
we need to fix, and it has nothing to do with this patch.

> X CORE_IN shouldn't deliver the whole request body if it is big... one I/O
>   buffer worth of request body is plenty... otherwise we eat up virtual storage

Core in just passes the brigade up to http_filter, or whatever filter is
there.  http_filter should use a length of MAX_STRING_LENGTH when doing a
read.

> Not-so-temporary hacks (i.e., I don't know how to fix it):
> 
>   There must be a nice way to avoid r->filtered_input (other than imposing
>   length games on all input filters).

Not that I can see either.  I hate both solutions almost as much as you
do.  We disagree on the cleaner method, but I'll live with the
r->filtered_input, as long as the brigade lives in core_dir_conf, not
request_rec.

> Known bugs:
> 
> X post 10000 bytes -- CGI sees 10002 bytes; the extra two are CR+LF at the end of
>   the body; depends on whether or not I'm debugging it :(

That's kind of cute.

> +static apr_status_t brigade_bgets(ap_bucket_brigade *bb,
> +                                  int remove_buckets,
> +                                  char *line,
> +                                  apr_ssize_t max_line_size)

<brigade_bgets snipped for brevity>

Why isn't this just getline?  This does a bit less than getline, but
getline is already there, and it is one less place for things to go wrong.

> +struct dechunk_ctx {
> +    apr_ssize_t chunk_size;
> +    apr_ssize_t bytes_delivered;
> +    enum {WANT_HDR /* must have value zero */, WANT_BODY, WANT_TRL} state;
> +    int remove_chunk_proto;
> +};
> +
> +static long get_chunk_size(char *);
> +
> +apr_status_t dechunk_filter(ap_filter_t *f, ap_bucket_brigade *bb,
> +                            apr_ssize_t length)
> +{
> +    apr_status_t rv;
> +    struct dechunk_ctx *ctx = f->ctx;
> +    ap_bucket *b;
> +    const char *buf;
> +    apr_ssize_t len;
> +
> +    if (!ctx) {
> +        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(struct dechunk_ctx));
> +        ctx->remove_chunk_proto = f->r->read_body != REQUEST_CHUNKED_PASS;
> +    }
> +
> +    do {
> +        if (ctx->chunk_size == ctx->bytes_delivered) {
> +            /* Time to read another chunk header or trailer...  http_filter() is 
> +             * the next filter in line and it knows how to return a brigade with 
> +             * one line.
> +             */
> +            char line[30];
> +            
> +            ap_debug_assert(!strcmp(f->next->frec->name, "HTTP_IN"));

This is bogus.  You can not assume that the next filter in the chain is
the http_filter.  All you know, is that the next filter in the chain looks
like http_filter.  Remove the assert, and I'll be perfectly happy though.

> +            if ((rv = ap_get_brigade(f->next, bb, AP_GET_LINE)) != APR_SUCCESS) {
> +                return rv;
> +            }
> +            if ((rv = brigade_bgets(bb, ctx->remove_chunk_proto,
> +                                    line, sizeof(line))) != APR_SUCCESS) {
> +                return rv;
> +            }

This should just be a single call to getline IMO.

>  apr_status_t http_filter(ap_filter_t *f, ap_bucket_brigade *b, apr_ssize_t length)
>  {
>  #define ASCII_LF '\012'
> @@ -889,32 +1026,35 @@
>          }
>      }
>  
> -    if (length > 0) {
> -        int remain = length;
> +    if (ctx->remaining > 0) {
>          const char *ignore;
>  
>          e = AP_BRIGADE_FIRST(b);
>          while (e != AP_BRIGADE_SENTINEL(b)) {
>              ap_bucket_read(e, &ignore, &len, 0);
> -            if (remain <= len) {
> +            if (ctx->remaining < len) {
>                  break;
>              }
> -            remain -= len;
> +            ctx->remaining -= len;
>              e = AP_BUCKET_NEXT(e);
>          }
>          if (e != AP_BRIGADE_SENTINEL(b)) {
> -            if (remain <= len) {
> -                ap_bucket_split(e, remain);
> -                remain = 0;
> +            if (ctx->remaining < len) {
> +                ap_bucket_split(e, ctx->remaining);
> +                ctx->remaining = 0;
>              }
>              bb = ap_brigade_split(b, AP_BUCKET_NEXT(e));
>              ctx->b = bb;
> -            return APR_SUCCESS;
>          }
>          else {
>              ctx->b = NULL;
> -            return APR_SUCCESS;
>          }

All of the changes above should be removed, because they are already
basically working.

> +        if (ctx->remaining == 0) {
> +            ap_bucket *eos = ap_bucket_create_eos();
> +
> +            AP_BRIGADE_INSERT_TAIL(b, eos);
> +        }
> +        return APR_SUCCESS;

Assuming ap_get_client_block works properly, this is 100% correct.  This
is the way I had originally hoped to do the EOS bucket on input, but I
didn't want to do too much to ap_get_client_block while we were still
discussing it.  I am ++1 for this change.

> +/*XXX hack to use mod_cgi[d] for testing chunked request bodies XXX */
> +        if (r->read_body == REQUEST_CHUNKED_ERROR) {
> +            r->read_body = REQUEST_CHUNKED_DECHUNK;
> +        }
> +/*XXX end of this particular hack :) */

I assume this will be gone in the real commit.  :-)

>          r->read_chunked = 1;
> +        if (r->read_body == REQUEST_CHUNKED_PASS) {
> +            /* can't filter the body *and* preserve the chunk headers */
> +            r->input_filters = r->connection->input_filters;
> +        }
> +        ap_add_input_filter("DECHUNK", NULL, r, r->connection);

This doesn't actually disable filtering.  I'm not really sure of the
correct way to disable filtering, but I have the feeling this will
actually need to happen in ap_get_client_block, not
ap_setup_client_block.  The problem is that it is perfectly possible for a
module to insert a filter between the two function calls.  Hmmmm.....

> +    if (!r->filtered_input) {
> +        r->filtered_input = ap_brigade_create(r->pool);
>      }

This should be done when creating the core_dir_config, not when reading
the request IMO.

> +        while (b->type != ap_eos_type() &&
> +               b->length == 0 && b != AP_BRIGADE_SENTINEL(bb)) {
> +            ap_bucket *e = b;
> +            b = AP_BUCKET_NEXT(e);
> +            AP_BUCKET_REMOVE(e);
> +            ap_bucket_destroy(e);
>          }
> +    } while (AP_BRIGADE_EMPTY(bb));

Remove this loop.  It just clutters up the code, and it doesn't actually
do anything.  If we have a brigade full of empty buckets the mainline code
has to deal with that case anyway.  The idea is that we could have an
empty bucket anyplace in the brigade.  If we encounter one while
processing the data, then we have to just deal with it.  Looping through a
brigade for the case where we have a brigade full of empty buckets is not
worth it IMHO.  It makes the code harder to understand and doesn't really
buy us anything.  I removed these loops from a lot of the code this
weekend.

>  /* In HTTP/1.1, any method can have a body.  However, most GET handlers
> 
> 
> Off to catch some zzzs...


Fix the stuff from my comments above, and please commit away.  :-)  Does
this finish the filtering stuff once and for all, other than bug fixes?  I
think we are finally there!!!!!!

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------





Mime
View raw message