httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [PATCH] reworking getline and friends
Date Wed, 11 Oct 2000 16:31:48 GMT

> > Think of it this way, you have a chunking filter.  That chunking filter
> > shares the raw_data brigade with the http_filter.  When the chunking
> > filter needs to get more data, it sets some data off to the side in the
> > raw_data brigade, and calls down to http_filter.  Http_filter then saves
> > more data in the same brigade.  This just won't work, and it gets much
> > more complex when you start adding more filters.
> I don't see the sharing happening quite like that.
> Ground zero (today's code): 
>   http_filter() knows headers and body-with-content-length
> Beyond fixing any bugs in it, we need to add support for dechunking in 
> there too.  We may want to be able to throw in support for other
> transport encodings in the future.

Actually, this is very easy to add.  This is one of the design goals, to
be able to add and remove transport encoding quickly and easily.  The
revelation that I had last night (at the airport so I haven't coded it
yet), is that http_filter just cares about two conditions, body or
header.  If we are in header mode, then we must convert \r\n to \n\0 and
send it to the next filter.  If we are in body mode, we must know how many
bytes to send to the next filter, and just send them.  That is controlled
by c->remaining.  This allows for a de-chunking filter very cleanly, and
without adding code to http_filter.  It also allows for a very clean
implementation of other transfer encodings.

> At this point, we have very different pieces of code that are
> implemented in the same filter.  These very different pieces of code
> have to share leftovers just as I propose that bgets() and the
> body-with-content-length filter and transport encoding support share
> leftovers.
> Benefits to throwing all the support in http_filter(): 
>   sharing is cleaner
> Detriments to throwing all the support in http_filter(): 
>   no extensibility to other transport encodings; if http_filter()
>   doesn't understand it, you can't do it; if you add extensibility
>   then sharing won't be as clean

See above, sharing will always be clean, because the http_filter just
determines if we need to send data cleanly or not.

>   performance penalty when processing headers, as http_filter() has to
>   walk through the header since we don't have a way for upper layers
>   to give back data if they read too much

No matter what you do, some filter needs to walk through the headers.  We
need to do the conversion, and that requires a header walk.  There is no
way around that.

>   http_filter() is a bigger mess because the one logical entity has to
>   understand too many things (isn't that one of the complaints about
>   buff?) 

Take a closer look at my most recent patch.  I took most of the code out
of the header portion of http_filter, and with my next patch, I'll be
removing most of the body handling stuff.

> > The whole idea of sharing a brigade between two filters just won't 
> > work.  This patch makes things look like buff, but one of the problems
> > with buff, is that the data isn't associated with one filter at a 
> > time.

> Alternatively, an all-knowing http_filter() might be as confusing to
> some as buff :)  The sharing is the way out of the all-knowing-ness.

http_filter is NOT all knowing.  It knows two things.  header or body.  It
has special processing for headers, for bodies it just passes data back.

> > With this design, we take a real chance of having one filter stomp all 
> > over another filter's data, or making it impossible to write input 
> > filters.  It becomes impossible to write filters if the  solution is
> > to create a separate brigade to save data within a  filter, and use
> > the c->raw_data brigade to pass data between filters.
> c->raw_data is the put-back mechanism for code which understands
> what is a header and what is a body.  Without a put-back mechanism,
> then one piece of code must be all-knowing.  Certainly there are
> cleaner possibilities for the put-back mechanism.

That just won't work.  How do you know which filter is currently using the
"put-back" machanism?  How do I know as a filter writer that the data I
push onto c->raw_data isn't going to be clobbered by the next call to the
http_filter?  How do multiple filters push-back data to a much lower
level?  Say I write a filter that implements bounds checking for forms.

What I am seeing when I read this, is that two filters (at least) have
access to a single bucket_brigade.  They both MUST check the bucket
brigade has data, and if so, remove that data.  Then they are both allowed
to munge that brigade.  This doesn't sound like a problem waiting to
happen to you?  It sure does to me.  The filters weren't designed to
communicate by pushing data back to the previous filter.  It can't be done
safely, because you are letting two filters have write access to the same
brigade.  That is not necessary, and it isn't necessary for a filter to be
all-knowing either.  In fact, what you have basically done with the
raw_data brigade, is to create one big morphing filter that is
all-knowing.  The interactions are going to be painful to find and fix,
and it just isn't necessary.

> In this other approach:
>  when processing the header:
>    no filter above core_input (or ssl_input)
>    bgets interacts directly with core_input

bgets is a filter.  It is at the very top of the filter stack.  bgets is a
filter the same way that all of the handlers are filters.  Handlers are
content-producing filters, bgets is an example of a content-consuming

>  when processing the body:
>    either the deliver_content_length_bytes filter or dechunk filter or
>    some other transport filter interacts directly with core_input; the
>    deliver_content_length_bytes filter is the code in http_filter() today
>    above this layer come any filters which play with the request body
>    (charset translation, virus scanning, whatever)

I understand the design.  I have played with it a few times, and on first
inspection yes, it looks very good.  But, what it essentially creates, is
a morphing filter that knows everything.  This is just not needed.

> In body mode, there has to be a piece of core-provided code* that sits
> between the network and any filters which muck with the body.  As long
> as this is provided, I don't see any problems going from body to
> header.

How do you inform the rest of the filters what mode you are in?  If we
pass too much data up the filter stack, it is not available to us
again.  Think of it this way.  I have a filter that does charset
translation on input data.  The problem comes with keep-alive requests,
when the filters below my charset-translation filter don't know that we
have stopped getting body data, and instead we are getting header data
now.  There needs to be some way to inform the bottom-most filters of that
transition.  If the bottom-most filters send too much data up the stack,
then it is irretrievable, and we lost the second request.

Your design basically says, let the bottom-most filters send up as much
data as they want, we'll make sure that we have an escape-route in, so
that if they send up too much, we'll hand it back to them.

My design says, these filters were written specifically for HTTP.  They
know when dealing with headers, they can only send up one line at a time,
and the filter takes care of folding the lines together when necessary
(not implemented yet).  The filter also knows that when we are in body
mode, the filter will be told how much data it is allowed to return.

Your design is a bit safer, because we have had the code implemented with
that basic idea for a while.  It also doesn't fit very cleanly into the
filtering scheme.  It requires some ugly comprimises that I would prefer
to not make.

My design is a bit more difficult, because it requires re-implementing a
lot of code that is already working.  However, IMO, it fits into the
filter design a lot cleaner, and it allows for more flexible filters to be

Again, not standing in the way of any commits, just talking about the
overall design here.  I should have a new patch that begins to implement
body handling in about an hour or so.


Ryan Bloom               
406 29th St.
San Francisco, CA 94131

View raw message