httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <trawi...@bellsouth.net>
Subject Re: [PATCH] reworking getline and friends
Date Wed, 11 Oct 2000 15:42:16 GMT
rbb@covalent.net writes:

> >   bgets() and http_filter() share yet another brigade with raw data
> >   from the network -- c->raw_data; presumably dechunking code would
> >   share c->raw_data too; when bgets() grabs some of the data after the
> >   end of the header, it goes back into c->raw_data for http_filter()
> >   to see; when http_filter() grabs some of the data after the end of
> >   the body, it goes back into c->raw_data for bgets() to see
> 
> I am not standing in the way of this, but this completely violates the
> whole concept of filtering.  You cannot share a brigade between multiple
> filters.  How does each filter know where they are in the brigade?
> 
> Think of it this way, you have a chunking filter.  That chunking filter
> shares the raw_data brigade with the http_filter.  When the chunking
> filter needs to get more data, it sets some data off to the side in the
> raw_data brigade, and calls down to http_filter.  Http_filter then saves
> more data in the same brigade.  This just won't work, and it gets much
> more complex when you start adding more filters.

I don't see the sharing happening quite like that.

Ground zero (today's code): 
  http_filter() knows headers and body-with-content-length

Beyond fixing any bugs in it, we need to add support for dechunking in 
there too.  We may want to be able to throw in support for other
transport encodings in the future.

At this point, we have very different pieces of code that are
implemented in the same filter.  These very different pieces of code
have to share leftovers just as I propose that bgets() and the
body-with-content-length filter and transport encoding support share
leftovers.

Benefits to throwing all the support in http_filter(): 
  sharing is cleaner

Detriments to throwing all the support in http_filter(): 
  no extensibility to other transport encodings; if http_filter()
  doesn't understand it, you can't do it; if you add extensibility
  then sharing won't be as clean

  performance penalty when processing headers, as http_filter() has to
  walk through the header since we don't have a way for upper layers
  to give back data if they read too much

  http_filter() is a bigger mess because the one logical entity has to
  understand too many things (isn't that one of the complaints about
  buff?) 


> The whole idea of sharing a brigade between two filters just won't
> work.  This patch makes things look like buff, but one of the problems
> with buff, is that the data isn't associated with one filter at a
> time.

Alternatively, an all-knowing http_filter() might be as confusing to
some as buff :)  The sharing is the way out of the all-knowing-ness.

> With this design, we take a real chance of having one filter stomp all
> over another filter's data, or making it impossible to write input
> filters.  It becomes impossible to write filters if the solution is to
> create a separate brigade to save data within a filter, and use the
> c->raw_data brigade to pass data between filters.

c->raw_data is the put-back mechanism for code which understands
what is a header and what is a body.  Without a put-back mechanism,
then one piece of code must be all-knowing.  Certainly there are
cleaner possibilities for the put-back mechanism.

> Now, as I said, I am not veto'ing this patch, but please take a very close
> look at the general direction things were going last week.  That is the
> correct approach, IMHO.  The idea was that the core_filter just read from
> the network.  The http_filter would take that data, and convert it if we
> were in header mode, just pass it up if we were in body mode.  Other
> filters would be added on top of those for body data.  When processing
> headers, the top-most filter would always be http_filter.

In this other approach:

 when processing the header:

   no filter above core_input (or ssl_input)

   bgets interacts directly with core_input

 when processing the body:

   either the deliver_content_length_bytes filter or dechunk filter or
   some other transport filter interacts directly with core_input; the
   deliver_content_length_bytes filter is the code in http_filter() today

   above this layer come any filters which play with the request body
   (charset translation, virus scanning, whatever)
   
> I didn't start the input filtering without a design, and I did give it a
> lot of thought.  One of the problems with the input filtering though, is
> that it is very difficult to determine if we are in header or body mode
> while we are inside filters.  It isn't hard to go from headers to body, it
> is hard to go the other direction.

In body mode, there has to be a piece of core-provided code* that sits
between the network and any filters which muck with the body.  As long
as this is provided, I don't see any problems going from body to
header.

*The exception to this is if/when we allow modules to implement
additional transport encodings.

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Mime
View raw message