httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <trawi...@bellsouth.net>
Subject Re: input filter commentary
Date Tue, 10 Oct 2000 11:43:29 GMT
Greg Stein <gstein@lyra.org> writes:

> *) chunked input is broken. the http_filter uses conn->remaining as a "flag"
>    to determine whether it is reading headers or the body. This is *very*
>    poor from a readability/maintainability standpoint. A flag in the CTX
>    would be much better.
>    
>    Further, conn->remaining never becomes non-zero when chunked input
>    arrives at the server. Therefore, http_filter continues to do CR/LF
>    translation on the body(!). Not to mention screwing up where it thinks it
>    is in the input handling. Some kind of input state (machine) should be
>    present.

I'm probably missing something big, but I'd like to think that where
http_filter() sits now we'd have one of several pieces of code
(filter) depending on the state and the method of sending the body:

header state:

     a piece of code that knows [CR]LF and where the header ends

     http_filter() already knows how to do this.

body state where content-length was provided:

     a piece of code that knows when the body ends based on content
     length and number of bytes delivered to the filter above

     http_filter() already knows how to do this.

body state where body is chunked and app doesn't want to see chunks:

     a dechunk filter

body state where body is chunked but app wants chunks passed through:

     a read-chunk filter that reads chunk headers to know where the
     body ends but doesn't remove the chunk headers/trailers

     I don't know if we really need to support this, but 1.3 has this
     functionality.  This breaks when any further filtering is
     performed, with the exception of filtering which maintains the
     original length.

any other state: 

     no body, stay ready to receive the next header

http_filter() could always switch on the state and call the right
piece of code, but I don't see any insurmountable problems associated
with them being separate filters, only one of which is installed at a
time.

As long as http_filter() has all of these responsibilities, playing
with different combinations of transport encodings is messy.

> *) http_filter cannot guarantee that it is returning a full line for
>    getline(). Thus, getline() must have a loop to fetch data. It scans
>    through the data looking for the LF (and could then do the CR/LF munging
>    for getline callers), so it is unclear why http_filter is doing the
>    munging.

Really, only one piece of code needs to know what a header looks
like.  getline() can solve the whole problem as long as there is a way
for it to put back any data that it has grabbed after the header.  As
long as http_filter() and getline() both have responsibilities
regarding the header then reading the header is going to be less than
optimal.

But if you don't buy the idea that we can switch out the http-level
filter then this idea doesn't go very far either :)

> Easy problem first:
> 
> == an input xlate filter can operate based on the current protocol state.
>    whether it enables/disables itself, or inserts/removes
> itself... dunno.

I'm certainly interested in the input xlate filter, but I guess I'm
more interested in nice ways for any filter to get inserted based on
the requested URI.  When that happens, then the input xlate filter
doesn't have to have any special smarts.

At the end of last week when this issue was touched on I was not able
to understand what problems keep us from being able to add filters
after we read the request header.

If we can't add filters particular to the requested URI then we
cripple input filtering.  It isn't cool to have to build a giant
filter list statically and have filter instances within the list
enable/disable themselves on a per request basis.  Consider multiple
arrangements for a pair of filters and what that means to the static
filter list.

> Ideally, we should have a SOCKET bucket (unless a FILE/PIPE bucket could do
> the trick). These buckets are designed to spin off HEAP buckets as they are
> read, which matches up with the suggestion above. Also, a SOCKET bucket
> would completely toss the core_input_filter -- we could just initialize
> conn->input_data with a brigade with the socket bucket in it.

Agreed.  This is what I assumed would happen originally.  Consumer (e.g.,
ap_get_client_block()) will destroy consumed buckets along the way,
releasing the IOBUFSIZE buffers to the free list for immediate re-use.

The file/pipe bucket could probably do the trick for Unix, but then
Certain Folks would have even less happiness.  A genuine socket
bucket, however imperfect, needs to be in place before any such change
is made so that systems where descriptors aren't interchangeable have
a chance of working.

> This splits the input handling into the appropriate levels:
> 
> *) core_input_filter simply talks to the network to fetch data
> 
> *) http_filter steps through a state machine to read HTTP requests
> 
> *) getline/get_client_block read/parses data from the request and returns it
>    in the appropriate form to the caller.

My hope is that ap_get_client_block() doesn't have much work to do.
If this is the case, then a newly-written module should feel free to
call ap_get_brigade() instead of ap_get_client_block().

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Mime
View raw message