Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 42659 invoked by uid 500); 23 Sep 2001 21:49:45 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 42648 invoked from network); 23 Sep 2001 21:49:44 -0000 X-Authentication-Warning: kurgan.lyra.org: gstein set sender to gstein@lyra.org using -f Date: Sun, 23 Sep 2001 14:53:14 -0700 From: Greg Stein To: dev@httpd.apache.org, Justin Erenkrantz Subject: Re: [PATCH] Rework httpd buckets/filtering Message-ID: <20010923145314.Q4050@lyra.org> References: <20010923041203.H6756@ebuilt.com> <20010923153307.AFA6346DFC@koj.rkbloom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20010923153307.AFA6346DFC@koj.rkbloom.net>; from rbb@covalent.net on Sun, Sep 23, 2001 at 08:33:07AM -0700 X-URL: http://www.lyra.org/greg/ X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Sun, Sep 23, 2001 at 08:33:07AM -0700, Ryan Bloom wrote: > > None of this can be committed until it has passed all of the tests in the perl test > framework. The last time somebody (me) mucked with these filters it broke all of the > CGI script handling. I would also like to hear from Jeff about this, because when we > wrote the filters originally, we had input filtering use this design, and Jeff and Greg > were adamant that filters needed to be able to return more than they were asked for. Euh... not sure if you typoed. Input filters should *NOT* return more than they were asked for. Never. If a filter did that, it could end up passing data up to a filter that is not responsible for that data. For example, reading past the end of a request, or reading past the headers yet that portion of the body was supposed to go into a newly-inserted unzip or dechunk filter. I can think of more examples, if called upon :-) The current input filter code *does* return too much, though, and that is one of its problems. I'm not sure that this change improves upon that (need to review some more, me thinks). A filter can return *less* than what was asked for, but never more. >... > > --- include/apr_buckets.h 2001/09/22 22:36:07 1.118 > > +++ include/apr_buckets.h 2001/09/23 10:36:12 > > @@ -689,6 +689,17 @@ > > apr_off_t *length); > > > > /** > > + * create a flat buffer of the elements in a bucket_brigade. > > + * @param b The bucket brigade to create the buffer from > > + * @param c The pointer to the returned character string > > + * @param len The length of the returned character string > > + * @param p The pool to allocate the character string from. > > + * Note: You *must* have enough space allocated to store all of the data. > > + * See apr_brigade_length(). > > + */ > > +APU_DECLARE(apr_status_t) apr_brigade_to_buffer(apr_bucket_brigade *b, > > char *c); + Ryan said this should take a reading mode. I don't understand why that should be necessary. You have a brigade of a defined size, and you're flattening that much data. You're already past the point of non-blocking reads being used. If you had a partial read, then how would you communiate that? And don't say strlen(c). No... the function does not need a mode, nor an output length. You give it a brigade and it flattens it in its entirety. If there is a problem getting the whole thing flattened, then it returns an error. Personally, I would pass in a length of the output buffer so the function can verify that it is flattening the proper amount -- not too little, and not overwriting. >... > > + case WANT_ENDCHUNK: > > + /* We don't care. Next state please. */ > > + ctx->state = WANT_TRL; > > + break; > > case WANT_TRL: > > - /* XXX sanity check end chunk here */ > > - if (strlen(line)) { > > - /* bad trailer */ > > - } > > - if (ctx->chunk_size == 0) { /* we just finished the last chunk? */ > > - /* ### woah... ap_http_filter() is doing this, too */ > > - /* append eos bucket and get out */ > > - b = apr_bucket_eos_create(); > > - APR_BRIGADE_INSERT_TAIL(bb, b); DECHUNK is the guy to insert the EOS. The HTTP filter cannot know how much data is in the request (if you're chunking, the C-L header is ignored). The above code should probably stay, and the HTTP filter should stop inserting EOS. But then again, if you aren't chunking, then the HTTP filter *does* define the end of the request. Fun, huh? That is why we discussed combining the two filters into one. The state stuff is completely wiped out, and we can easily determine EOS, simplifying everything greatly. Consider that when somebody asks the (new) HTTP filter for input. It can zoom through the headers in direct procedural code. Then it gets to the body, start dechunking if necessary, and places the proper amount of read data into the passed-brigade. >... > > - /* ### the code below, which moves bytes from one brigade to the > > - ### other is probably bogus. presuming the next filter down was > > - ### working properly, it should not have returned more than > > - ### READBYTES bytes, and we wouldn't have to do any work. > > - */ You have to fix the lower code before tossing this stuff. >... > > - /* ### this is a hack. it is saying, "if we have read everything > > - ### that was requested, then we are at the end of the request." > > - ### it presumes that the next filter up will *only* call us > > - ### with readbytes set to the Content-Length of the request. > > - ### that may not always be true, and this code is *definitely* > > - ### too presumptive of the caller's intent. the point is: this > > - ### filter is not the guy that is parsing the headers or the > > - ### chunks to determine where the end of the request is, so we > > - ### shouldn't be monkeying with EOS buckets. > > - */ > > - if (*readbytes == 0) { > > - apr_bucket *eos = apr_bucket_eos_create(); > > - > > - APR_BRIGADE_INSERT_TAIL(b, eos); > > - } Somebody has to return an EOS somewhere. Consider the highest-level input filter. How does it know when it has read all of the data for the current request? It needs to see an EOS bucket. I'll wait for the new patch and review again. All the formatting changes were getting in the way. But I really think the proper tack is to fix the amount-returned, and to combine the HTTP_IN and DECHUNK filters. Cheers, -g -- Greg Stein, http://www.lyra.org/