httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <trawi...@bellsouth.net>
Subject Re: [PATCH] dechunking, body filtering
Date Mon, 16 Oct 2000 20:56:32 GMT
rbb@covalent.net writes:

> >   r
> >  
> >     Add "void *filtered_input" field; ap_get_client_block() holds onto 
> >     leftover body in this field.  Ugly that we clutter the r...  Ugly that
> >     I made it "void *" instead of "ap_bucket_brigade *" (avoids an additional
> >     #include in "httpd.h" for (probably) no good reason).
> 
> Mentioned in another note, but repeating it here for completeness.   This
> belongs in core_dir_conf, not the request_rec.  This is information only
> for a core function, so it shouldn't be available to everything.  Also,
> this really should be a bucket_brigade *, but that's REALLY minor.

done, committed...

> 
> >   remember that the read_policy parameter is copied to r->read_body;
> > 
> >   if read_policy is REQUEST_CHUNKED_PASS and the transport encoding is
> >   chunked...
> >  
> >     Disable filtering of the request body, because chunk headers would
> >     be invalid
> >     if the filter changed the length.
> 
> I'm not sure I like this, but I dislike the other options more.  The
> problem here, is that we are changing how a request is received based on
> information in the filter.  Then again, we only use REQUEST_CHUNKED_PASS
> when we are discarding the whole body anyway.  I have no good answer, but
> the solution I hate more than disabling filtering, is for the chunk filter
> to fix the chunk headers.  The solution I like a bit more, is to remove
> this option altogether, although that too is a bit bogus.  Just some food
> for thought on this.

I was wondering out loud some weeks ago on what to do with this
issue...  the only response I got was a "shrug" from Roy :) 

I've tried googling for REQUEST_CHUNKED_PASS but I didn't come up with
anything outside of our code.  I doubt that there is anything.

> >       r->input_filters = r->connection->input_filters;
> >  
> >     dechunk_filter() will look at the read policy to know whether or not to leave
the
> >     chunk headers.
> > 
> > core's register_hooks()
> > 
> >     register the dechunk filter; play games with filter type so that it is
> >     added "properly"
> 
> BTW, I hate this, but that is a problem with our filter registration that
> we need to fix, and it has nothing to do with this patch.

yep...

> > X CORE_IN shouldn't deliver the whole request body if it is big... one I/O
> >   buffer worth of request body is plenty... otherwise we eat up virtual storage
> 
> Core in just passes the brigade up to http_filter, or whatever filter is
> there.  http_filter should use a length of MAX_STRING_LENGTH when doing a
> read.

Yeah, I got my filter names wrong.  Anyway...  When http_filter() does
a bucket-read to get data to deliver, it doesn't get to choose how
much to read.  In the normal case where it reads a socket bucket, it
will get up to IOBUFSIZE bytes.  A simple change for now would be to
return once it does a read which delivers >0 bytes.  When the block
parameter to bucket read actually does something for a socket this
code needs to be revisited anyway.

> > Not-so-temporary hacks (i.e., I don't know how to fix it):
> > 
> >   There must be a nice way to avoid r->filtered_input (other than imposing
> >   length games on all input filters).
> 
> Not that I can see either.  I hate both solutions almost as much as you
> do.  We disagree on the cleaner method, but I'll live with the
> r->filtered_input, as long as the brigade lives in core_dir_conf, not
> request_rec.
> 
> > Known bugs:
> > 
> > X post 10000 bytes -- CGI sees 10002 bytes; the extra two are CR+LF at the end of
> >   the body; depends on whether or not I'm debugging it :(
> 
> That's kind of cute.

I just remembered what the last sentence means: whether or not I'm in
gdb...  gotta try to reproduce this one tonight after the kiddies go to bed

> > +static apr_status_t brigade_bgets(ap_bucket_brigade *bb,
> > +                                  int remove_buckets,
> > +                                  char *line,
> > +                                  apr_ssize_t max_line_size)
> 
> <brigade_bgets snipped for brevity>
> 
> Why isn't this just getline?  This does a bit less than getline, but
> getline is already there, and it is one less place for things to go
> wrong.

Tonight or tomorrow I'll look at using getline()...

> > +struct dechunk_ctx {
> > +    apr_ssize_t chunk_size;
> > +    apr_ssize_t bytes_delivered;
> > +    enum {WANT_HDR /* must have value zero */, WANT_BODY, WANT_TRL} state;
> > +    int remove_chunk_proto;
> > +};
> > +
> > +static long get_chunk_size(char *);
> > +
> > +apr_status_t dechunk_filter(ap_filter_t *f, ap_bucket_brigade *bb,
> > +                            apr_ssize_t length)
> > +{
> > +    apr_status_t rv;
> > +    struct dechunk_ctx *ctx = f->ctx;
> > +    ap_bucket *b;
> > +    const char *buf;
> > +    apr_ssize_t len;
> > +
> > +    if (!ctx) {
> > +        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(struct dechunk_ctx));
> > +        ctx->remove_chunk_proto = f->r->read_body != REQUEST_CHUNKED_PASS;
> > +    }
> > +
> > +    do {
> > +        if (ctx->chunk_size == ctx->bytes_delivered) {
> > +            /* Time to read another chunk header or trailer...  http_filter() is

> > +             * the next filter in line and it knows how to return a brigade with

> > +             * one line.
> > +             */
> > +            char line[30];
> > +            
> > +            ap_debug_assert(!strcmp(f->next->frec->name, "HTTP_IN"));
> 
> This is bogus.  You can not assume that the next filter in the chain is
> the http_filter.  All you know, is that the next filter in the chain looks
> like http_filter.  Remove the assert, and I'll be perfectly happy
> though.

it will be gone very soon...

> > +            if ((rv = ap_get_brigade(f->next, bb, AP_GET_LINE)) != APR_SUCCESS)
{
> > +                return rv;
> > +            }
> > +            if ((rv = brigade_bgets(bb, ctx->remove_chunk_proto,
> > +                                    line, sizeof(line))) != APR_SUCCESS) {
> > +                return rv;
> > +            }
> 
> This should just be a single call to getline IMO.

(same as above comment regarding the use of getline())

> >  apr_status_t http_filter(ap_filter_t *f, ap_bucket_brigade *b, apr_ssize_t length)
> >  {
> >  #define ASCII_LF '\012'
> > @@ -889,32 +1026,35 @@
> >          }
> >      }
> >  
> > -    if (length > 0) {
> > -        int remain = length;
> > +    if (ctx->remaining > 0) {
> >          const char *ignore;
> >  
> >          e = AP_BRIGADE_FIRST(b);
> >          while (e != AP_BRIGADE_SENTINEL(b)) {
> >              ap_bucket_read(e, &ignore, &len, 0);
> > -            if (remain <= len) {
> > +            if (ctx->remaining < len) {
> >                  break;
> >              }
> > -            remain -= len;
> > +            ctx->remaining -= len;
> >              e = AP_BUCKET_NEXT(e);
> >          }
> >          if (e != AP_BRIGADE_SENTINEL(b)) {
> > -            if (remain <= len) {
> > -                ap_bucket_split(e, remain);
> > -                remain = 0;
> > +            if (ctx->remaining < len) {
> > +                ap_bucket_split(e, ctx->remaining);
> > +                ctx->remaining = 0;
> >              }
> >              bb = ap_brigade_split(b, AP_BUCKET_NEXT(e));
> >              ctx->b = bb;
> > -            return APR_SUCCESS;
> >          }
> >          else {
> >              ctx->b = NULL;
> > -            return APR_SUCCESS;
> >          }
> 
> All of the changes above should be removed, because they are already
> basically working.

I went to the version of this code in cvs which I replaced today but
can't find much that I should pick up.  Part of it is that I report
eos a little sooner...  another difference is that the length parm is
unused in my version...  another issue is that some of the replaced
code seemed to have the intention of telling the bucket read routine
how much data to read.  I've got some minor tweaks which make it look
more like what I replaced, but there isn't anything major.

> > +        if (ctx->remaining == 0) {
> > +            ap_bucket *eos = ap_bucket_create_eos();
> > +
> > +            AP_BRIGADE_INSERT_TAIL(b, eos);
> > +        }
> > +        return APR_SUCCESS;
> 
> Assuming ap_get_client_block works properly, this is 100% correct.  This
> is the way I had originally hoped to do the EOS bucket on input, but I
> didn't want to do too much to ap_get_client_block while we were still
> discussing it.  I am ++1 for this change.
> 
> > +/*XXX hack to use mod_cgi[d] for testing chunked request bodies XXX */
> > +        if (r->read_body == REQUEST_CHUNKED_ERROR) {
> > +            r->read_body = REQUEST_CHUNKED_DECHUNK;
> > +        }
> > +/*XXX end of this particular hack :) */
> 
> I assume this will be gone in the real commit.  :-)

yes, but only commented out at the moment...  I'll zap it completely
momentarily. 

> >          r->read_chunked = 1;
> > +        if (r->read_body == REQUEST_CHUNKED_PASS) {
> > +            /* can't filter the body *and* preserve the chunk headers */
> > +            r->input_filters = r->connection->input_filters;
> > +        }
> > +        ap_add_input_filter("DECHUNK", NULL, r, r->connection);
> 
> This doesn't actually disable filtering.  I'm not really sure of the
> correct way to disable filtering, but I have the feeling this will
> actually need to happen in ap_get_client_block, not
> ap_setup_client_block.  The problem is that it is perfectly possible for a
> module to insert a filter between the two function calls.
> Hmmmm.....

no obvious solution...

> > +    if (!r->filtered_input) {
> > +        r->filtered_input = ap_brigade_create(r->pool);
> >      }
> 
> This should be done when creating the core_dir_config, not when reading
> the request IMO.

yes, that sounds much better...  I'll get to that tonight or tomorrow.

> > +        while (b->type != ap_eos_type() &&
> > +               b->length == 0 && b != AP_BRIGADE_SENTINEL(bb)) {
> > +            ap_bucket *e = b;
> > +            b = AP_BUCKET_NEXT(e);
> > +            AP_BUCKET_REMOVE(e);
> > +            ap_bucket_destroy(e);
> >          }
> > +    } while (AP_BRIGADE_EMPTY(bb));
> 
> Remove this loop.  It just clutters up the code, and it doesn't actually
> do anything.  If we have a brigade full of empty buckets the mainline code
> has to deal with that case anyway.  The idea is that we could have an
> empty bucket anyplace in the brigade.  If we encounter one while
> processing the data, then we have to just deal with it.  Looping through a
> brigade for the case where we have a brigade full of empty buckets is not
> worth it IMHO.  It makes the code harder to understand and doesn't really
> buy us anything.  I removed these loops from a lot of the code this
> weekend.

this will be gone very soon

> >  /* In HTTP/1.1, any method can have a body.  However, most GET handlers
> > 
> > 
> > Off to catch some zzzs...
> 
> 
> Fix the stuff from my comments above, and please commit away.  :-)  Does
> this finish the filtering stuff once and for all, other than bug fixes?  I
> think we are finally there!!!!!!

We're definitely getting close.  Greg Ames and I need to look at the
EBCDIC issues, but I don't think that will change the shape of
anything.

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Mime
View raw message