httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [PATCH] dechunking, body filtering (fwd)
Date Sat, 14 Oct 2000 15:51:42 GMT

I wasn't paying enough attention to Jeff's message, and I didn't see that
it was just to me.  This was my response.  Obviously, I'm not going to
reply to his new one, because the comments are the same.  :-)

<Msg begins here>

Please read this message in two parts.  The first section (Everything up
to "SNIP") was written second, but it is the correct part of the
message.  The second part was my original thoughts, until I took a much
closer look at the code.

The problem here is that we are completely wrong with the http_filter.  I
will take complete responsability for that, because I wrote it
initially.  The clean solution, is to move the http_filter from a
connection based filter to a request based filter.  This allows
http_filter to learn the content-length from the actual request.  It also
allows ap_get_client_block to just call ap_get_brigade until it gets an

The chunking filter would go on top of http_filter just to keep
http_filter simpler.  It could be a part of http_filter, but I would
prefer to separate the code a bit.  I will not be writing the de-chunking
filter right now.  I want to re-org http_filter first.  Expect a commit
later today that makes this change.  This should remove a LOT of the
contentious points from the discussion, because there is no need to tell
the http_filter anything, it will be able to just discover it for itself.


On 14 Oct 2000, Jeff Trawick wrote:
> writes:
> > You can't do this.  You can't even garauntee that http_filter will be in
> > the server.  I have every intention of writing an input filter that takes
> > ftp requests and converts them to http.  When I do that, there will be no
> > http_filter in the request, and this will break.  No filter can EVER
> > depend on another filter being in the stack.
> You could put such a filter below HTTP_IN.
> There are other cases where code outside of HTTP_IN will need to know
> what it is up to.
> Example:
>   we're in some buffering logic on output...  we need to know whether
>   there is more data already received on input (the next request)
>   before we decide to sent a small amount of data...  HTTP_IN would be
>   holding onto such data if it has been received...  we need to take a
>   look in its context

This is a bogus example.  We don't do that.  If we want to know if there
is more data, we set the data we have off to the side and wait for an EOS
bucket to be sent.  That is how Apache has always handled this sort of
thing.  If the EOS isn't sent by the handler, we get it because of

> One part of Apache can certainly depend on another part of Apache
> being there.  Filters don't change this.  We can require that *our*
> code be at that level to properly enforce certain parts of HTTP.  We
> don't allow modules to replace just anything.

But http_filter doesn't need to be required, and making it required
actually limits what other input filters we can write, or at least makes
it a bit more annoying.  Imagine a filter that just creates it's own
request (I have already seen this filter in action BTW).  Now, we have to
have the http_filter in the filter chain, even though my module
understands exactly where it is in the protocol, and how to deal with the

The problem is that http_filter has the wrong name now.  When it was
started, it was going to really understand the http protocol.  Now, it is
just a filter that understands how and when to unbuffer data.

> > >   r
> > >  
> > >     Add "void *filtered_input" field; ap_get_client_block() holds onto 
> > >     leftover body in this field.  Ugly that we clutter the r...  Ugly that
> > >     I made it "void *" instead of "ap_bucket_brigade *" (avoids an additional
> > >     #include in "httpd.h" for (probably) no good reason).
> > 
> > This was discussed yesterday and people really disliked it.  If filters
> > are told the maximum amount of data they can return, this is
> > unnecessary.
> Some people really disliked it, sure...  I knew that and went ahead
> anyway because still I think that the length parameter has some issues
> with it.
> . The length parameter means that filter B can ask filter C to return
>   an amount of data which is convenient for filter B.  Any benefit to
>   a such a filter goes away because filter B then has to split at the
>   point convenient to filter A.  It is much simpler when filters can
>   return the amount of data they find appropriate. 

Um, that is the same way that the read function works, the caller tells
the operating system how much data it can comfortably handle, and the
operating system respects that.

> . ap_get_client_block() telling the next filter that it can only
>   accept 8192 bytes breaks certain cases like a pipe bucket.  Do we
>   need to implement some sort of filter to sit below
>   ap_get_client_block() to get everything in memory and return only
>   the desired amount of data?  All this to avoid a field in the r?

Ummm, since there is no filter that returns a pipe bucket on input this
too is a bogus example.  We aren't talking about adding a new filter.  If
your filter creates a pipe bucket, then yes it is your filters
responsability to make sure we don't return too much data.

> . The fact that ap_get_client_block() deals with filtered byte counts
>   but HTTP_IN deals with raw byte counts is also messy.  How does this
>   work, anyway? 

ap_get_client_block doesn't deal with byte counts at all.  Yes, it still
keeps track of them, but it doesn't actually use that
information.  ap_setup_client_block tells http_filter how much data needs
to be read from the network, this is done through the conn_rec, which is
messy, and I tried to avoid it for 8 hours.  It can't be avoided without
some even uglier parameter passing or having one filter modify another
filter's private data.  When ap_get_client block sees an EOS, it stops
reading data.  There is one mistruth in this, which is that we still pass
check the content-length when we tell lower filters how much data
ap_get_client_block can accept.  That is a bug that is there because the
code hasn't really been cleaned yet.


Ryan Bloom               
406 29th St.
San Francisco, CA 94131

View raw message