Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Date: Sun, 23 Sep 2001 14:53:14 -0700
From: Greg Stein <gstein@lyra.org>
To: dev@httpd.apache.org, Justin Erenkrantz <jerenkrantz@ebuilt.com>
Subject: Re: [PATCH] Rework httpd buckets/filtering
Message-ID: <20010923145314.Q4050@lyra.org>
References: <20010923041203.H6756@ebuilt.com>
 <20010923153307.AFA6346DFC@koj.rkbloom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <20010923153307.AFA6346DFC@koj.rkbloom.net>;
 from rbb@covalent.net on Sun, Sep 23, 2001 at 08:33:07AM -0700

On Sun, Sep 23, 2001 at 08:33:07AM -0700, Ryan Bloom wrote:
> 
> None of this can be committed until it has passed all of the tests in the perl test
> framework. The last time somebody (me) mucked with these filters it broke all of the
> CGI script handling. I would also like to hear from Jeff about this, because when we
> wrote the filters originally, we had input filtering use this design, and Jeff and Greg
> were adamant that filters needed to be able to return more than they were asked for.

Euh... not sure if you typoed. Input filters should *NOT* return more than
they were asked for. Never.

If a filter did that, it could end up passing data up to a filter that is
not responsible for that data. For example, reading past the end of a
request, or reading past the headers yet that portion of the body was
supposed to go into a newly-inserted unzip or dechunk filter.

I can think of more examples, if called upon :-)

The current input filter code *does* return too much, though, and that is
one of its problems. I'm not sure that this change improves upon that (need
to review some more, me thinks).

A filter can return *less* than what was asked for, but never more.

>...
> > --- include/apr_buckets.h	2001/09/22 22:36:07	1.118
> > +++ include/apr_buckets.h	2001/09/23 10:36:12
> > @@ -689,6 +689,17 @@
> >                                               apr_off_t *length);
> >
> >  /**
> > + * create a flat buffer of the elements in a bucket_brigade.
> > + * @param b The bucket brigade to create the buffer from
> > + * @param c The pointer to the returned character string
> > + * @param len The length of the returned character string
> > + * @param p The pool to allocate the character string from.
> > + * Note: You *must* have enough space allocated to store all of the data.
> > + * See apr_brigade_length().
> > + */
> > +APU_DECLARE(apr_status_t) apr_brigade_to_buffer(apr_bucket_brigade *b,
> > char *c); +

Ryan said this should take a reading mode. I don't understand why that
should be necessary. You have a brigade of a defined size, and you're
flattening that much data. You're already past the point of non-blocking
reads being used.

If you had a partial read, then how would you communiate that? And don't say
strlen(c).

No... the function does not need a mode, nor an output length. You give it a
brigade and it flattens it in its entirety. If there is a problem getting
the whole thing flattened, then it returns an error.

Personally, I would pass in a length of the output buffer so the function
can verify that it is flattening the proper amount -- not too little, and
not overwriting.

>...
> > +            case WANT_ENDCHUNK:
> > +                /* We don't care.  Next state please. */
> > +                ctx->state = WANT_TRL;
> > +                break;
> >              case WANT_TRL:
> > -                /* XXX sanity check end chunk here */
> > -                if (strlen(line)) {
> > -                    /* bad trailer */
> > -                }
> > -                if (ctx->chunk_size == 0) { /* we just finished the last chunk? */
> > -                    /* ### woah... ap_http_filter() is doing this, too */
> > -                    /* append eos bucket and get out */
> > -                    b = apr_bucket_eos_create();
> > -                    APR_BRIGADE_INSERT_TAIL(bb, b);

DECHUNK is the guy to insert the EOS. The HTTP filter cannot know how much
data is in the request (if you're chunking, the C-L header is ignored). The
above code should probably stay, and the HTTP filter should stop inserting
EOS.

But then again, if you aren't chunking, then the HTTP filter *does* define
the end of the request. Fun, huh? That is why we discussed combining the two
filters into one. The state stuff is completely wiped out, and we can easily
determine EOS, simplifying everything greatly.

Consider that when somebody asks the (new) HTTP filter for input. It can
zoom through the headers in direct procedural code. Then it gets to the
body, start dechunking if necessary, and places the proper amount of read
data into the passed-brigade.

>...
> > -        /* ### the code below, which moves bytes from one brigade to the
> > -           ### other is probably bogus. presuming the next filter down was
> > -           ### working properly, it should not have returned more than
> > -           ### READBYTES bytes, and we wouldn't have to do any work.
> > -        */

You have to fix the lower code before tossing this stuff.

>...
> > -        /* ### this is a hack. it is saying, "if we have read everything
> > -           ### that was requested, then we are at the end of the request."
> > -           ### it presumes that the next filter up will *only* call us
> > -           ### with readbytes set to the Content-Length of the request.
> > -           ### that may not always be true, and this code is *definitely*
> > -           ### too presumptive of the caller's intent. the point is: this
> > -           ### filter is not the guy that is parsing the headers or the
> > -           ### chunks to determine where the end of the request is, so we
> > -           ### shouldn't be monkeying with EOS buckets.
> > -        */
> > -        if (*readbytes == 0) {
> > -            apr_bucket *eos = apr_bucket_eos_create();
> > -
> > -            APR_BRIGADE_INSERT_TAIL(b, eos);
> > -        }

Somebody has to return an EOS somewhere.

Consider the highest-level input filter. How does it know when it has read
all of the data for the current request? It needs to see an EOS bucket.


I'll wait for the new patch and review again. All the formatting changes
were getting in the way.

But I really think the proper tack is to fix the amount-returned, and to
combine the HTTP_IN and DECHUNK filters.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/