httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jus...@erenkrantz.com>
Subject Re: [PATCH] Move http header parsing into the http_input_filter
Date Tue, 25 Feb 2003 08:42:04 GMT
--On Monday, February 24, 2003 12:21 PM -0500 Bill Stoddard 
<bill@wstoddard.com> wrote:

> 1. HTTP header parsing moved out of protocol.c and into the HTTP_IN
> filter (ap_http_filter in http_protocol.c)

My biggest problem here is wondering where you are reading the
headers.  If you are using the filters, something has to trigger the
read, but when to issue that read isn't always clear.  I'm afraid you
are going to be reading the request prematurely or too late.  I think
there's a disconnect here, but let me proceed to read the patch...

> 3. A pointer to the current request req is placed in the conn_rec
> for use by ap_http_filter

In the past, people have stated that this is a very bad idea.  I
don't know if I agree, but if we're doing it for one connection-level
filter, I think it should be done for all.

> 4. New function, ap_brigade_getline() replaces
> ap_get_brigade(AP_MODE_GETLINE).

Seems like this should be calling apr_brigade_split_line with some
logic around it for fetching more data.  But, the code duplication
between here and apr_brigade_split_line is worrisome.

> I generally like the idea of putting protocol parsing in a
> connection level filter. I think this is a more straight forward
> way to use the Apache 2 infrastructure to handle protocols.

As I said before, I disagree because this breaks the ability to
upgrade the protocol via inserting new protocol filters.  Once a
filter sets aside data that it doesn't process, the protocol filters
are broken.

On to the patch...
(As a note, it'd be really nice if you wrapped the code in the right
places.  My mailer is crying at formatting this patch because most of
the lines are over 80 characters...so, if this message gets munged...
I'm trying my best here...)

> Index: include/http_protocol.h
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/include/http_protocol.h,v
> retrieving revision 1.84
> diff -u -r1.84 http_protocol.h
> --- include/http_protocol.h	3 Feb 2003 17:52:53 -0000	1.84
> +++ include/http_protocol.h	24 Feb 2003 17:03:20 -0000
> @@ -717,6 +717,9 @@
> 
apr_bucket_brigade *);
>  AP_DECLARE_NONSTD(apr_status_t) ap_old_write_filter(ap_filter_t 
*f, apr_bucket_brigade *b);
>
> +AP_CORE_DECLARE_NONSTD(apr_status_t) 
ap_http_header_input_filter(ap_filter_t *f, apr_bucket_brigade *b,
> + 
ap_input_mode_t mode, apr_read_type_e block,
> + 
apr_off_t readbytes);

I realize that all of the other ap_*_filter's are currently declared
like this.  But, I don't see any reason why this must be.  We
shouldn't be exporting our filters.  Seems like something we should
clean-up in 2.1.  Not really specific to this patch, but...

> Index: include/httpd.h
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/include/httpd.h,v
> retrieving revision 1.194
> diff -u -r1.194 httpd.h
> --- include/httpd.h	3 Feb 2003 17:52:53 -0000	1.194
> +++ include/httpd.h	24 Feb 2003 17:03:20 -0000
> @@ -1013,6 +1013,7 @@
>     void *sbh;
>     /** The bucket allocator to use for all bucket/brigade 
creations */
>     struct apr_bucket_alloc_t *bucket_alloc;
> +    request_rec *r;
>  };

I would think this is the wrong place to store the association
between a connection and the request.  This association should only
be aware (if it is at all, see above) in the filter ctx with the f->r
field for the ap_filter_t.  I think exposing the request in the
conn_rec is something that we should avoid.  Perhaps exposing it via
ap_filter_t's request_rec field *might* be permissible.

> Index: modules/http/http_core.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/modules/http/http_core.c,v
> retrieving revision 1.309
> diff -u -r1.309 http_core.c
> --- modules/http/http_core.c	3 Feb 2003 17:53:03 -0000	1.309
> +++ modules/http/http_core.c	24 Feb 2003 17:03:20 -0000
> @@ -277,11 +277,18 @@
>      int csd_set = 0;
>     apr_socket_t *csd = NULL;
>
> +    /* Add the http_header_input_filter to the filter stack. This 
filter should
> +     * remain installed for the duration of the connection.
> +     * XXX: should this be added in a pre_connection hook? I think 
doing it
> +     * here makes the code easier to read and understand.
> +     */
> +    ap_add_input_filter_handle(ap_http_input_filter_handle,
> +                               NULL, NULL, c);
> +

I think it makes more sense to do this in a pre_connection hook
because it allows us to cleanly separate the setup from the actual
reading.  I think pre_connection should be responsible for setting up
the connection.  This moves the setup from pre_connection down into
process_connection.  I'm not sure that is the best way to go.

> @@ -338,7 +344,7 @@
>      ap_hook_create_request(http_create_request, NULL, NULL, 
APR_HOOK_REALLY_LAST);
>      ap_http_input_filter_handle =
>          ap_register_input_filter("HTTP_IN", ap_http_filter,
> -                                 NULL, AP_FTYPE_PROTOCOL);
> +                                 NULL, AP_FTYPE_CONNECTION);
>      ap_http_header_filter_handle =
>          ap_register_output_filter("HTTP_HEADER", 
ap_http_header_filter,
>                                    NULL, AP_FTYPE_PROTOCOL);

Just as a note, if we go your route and always have protocols as
connection-based, then this wouldn't be needed.  We'd have to somehow
address the concerns that we deal with the proto_output_filters code
in the redirects.  Not sure if they'd need to change or not.

> Index: modules/http/http_protocol.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/modules/http/http_protocol.c,v
> retrieving revision 1.465
> diff -u -r1.465 http_protocol.c
> --- modules/http/http_protocol.c	19 Feb 2003 06:50:10 -0000	1.465
> +++ modules/http/http_protocol.c	24 Feb 2003 17:03:21 -0000
> @@ -739,75 +739,526 @@
>      /* it wasn't found in the hash */
>      return NULL;
> }
> +/*
> + * Iterate across bbIn looking for an APR_ASCII_LF. If no LF is 
found, attempt to
> + * read more bytes from the filter stack and keep searching until 
a LF is found or
> + * an error condition is encountered. Unnused (unconsumed?) 
buckets are returned to
> + * the caller in bbIn.
> + *
> + */
> +static apr_status_t ap_brigade_getline(ap_filter_t *f,

As I said above, this code somehow needs to be factored in with
apr_brigade_split_line.  Also, this code would completely blow up if
an EOS bucket is read.  It'd probably go into an infinite loop.
The filters are allowed to return an EOS rather than an empty
brigade.

> +static apr_status_t check_header_fields(request_rec *r)
> +{
> +    const char *expect;
> +    if ((!r->hostname && (r->proto_num >= HTTP_VERSION(1, 1)))
> +        || ((r->proto_num == HTTP_VERSION(1, 1))
> +            && !apr_table_get(r->headers_in, "Host"))) {
> +        /*
> +         * Client sent us an HTTP/1.1 or later request without 
telling us the
> +         * hostname, either with a full URL or a Host: header. We 
therefore
> +         * need to (as per the 1.1 spec) send an error.  As a 
special case,
> +         * HTTP/1.1 mentions twice (S9, S14.23) that a request 
MUST contain
> +         * a Host: header, and the server MUST respond with 400 if 
it doesn't.
> +         */
> +        r->status = HTTP_BAD_REQUEST;
> +        ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
> +                      "client sent HTTP/1.1 request without 
hostname "
> +                      "(see RFC2616 section 14.23): %s", r->uri);
> +    }
> +
> +    if (r->status != HTTP_OK) {
> +#if 0
> +        ap_send_error_response(r, 0);
> +        ap_run_log_transaction(r);
> +        return r;
> +#endif
> +    }
> +
> +    if (((expect = apr_table_get(r->headers_in, "Expect")) != NULL)
> +        && (expect[0] != '\0')) {

Just as a note, you moved the Expect check to before we call
ap_post_read_request.  It needs to be done after.  Don't really
know how you'd do that in your scheme, but a module needs to be able
to remove the Expect headers.  (The current code allows for this.)

...snip...

> -typedef struct http_filter_ctx {
> +typedef struct http_input_filter_ctx {
> +    apr_bucket_brigade *b;
> +    enum {
> +        HIF_READ_HEADER_FIELDS,
> +        HIF_READ_BODY_LENGTH,
> +        HIF_READ_BODY_CHUNKED_SEND_100,
> +        HIF_READ_BODY_CHUNKED,
> +        HIF_BODY_NONE,
> +        HIF_EOS_SENT
> +    } state;

A problem with this enum is that you are relying upon
HIF_READ_HEADER_FIELDS to be 0.  You need to set it explictly when
you create the context rather than allowing apr_pcalloc to set it to
zero.  (HIF_READ_HEADER_FIELDS = 0 here might help, but I'd rather
see it explicit.)

>      apr_off_t remaining;
>      apr_off_t limit;
>      apr_off_t limit_used;
> -    enum {
> -        BODY_NONE,
> -        BODY_LENGTH,
> -        BODY_CHUNK
> -    } state;
> -    int eos_sent;
> -} http_ctx_t;
> -
> -/* This is the HTTP_INPUT filter for HTTP requests and responses 
from
> - * proxied servers (mod_proxy).  It handles chunked and 
content-length
> - * bodies.  This can only be inserted/used after the headers
> - * are successfully parsed.
> - */
> +    const char *tenc;
> +    const char *lenp;
> +} http_input_filter_ctx_t;
> +static apr_status_t reset_state(void *arg)
> +{
> +    http_input_filter_ctx_t *ctx = (http_input_filter_ctx_t *) arg;
> +    ctx->state = HIF_READ_HEADER_FIELDS;
> +    return APR_SUCCESS;
> +}
>  apr_status_t ap_http_filter(ap_filter_t *f, apr_bucket_brigade *b,
>                              ap_input_mode_t mode, apr_read_type_e
> block,                              apr_off_t readbytes)
>  {
> -    apr_bucket *e;
> -    http_ctx_t *ctx = f->ctx;
>      apr_status_t rv;
> +    http_input_filter_ctx_t *ctx = f->ctx;
> +    apr_bucket *e;
>      apr_off_t totalread;
> -
> -    /* just get out of the way of things we don't want. */
> +    request_rec *r;
> +    r = f->r = f->c->r;
> +

Why must we always set this?  Why not in the !ctx case only?

>      if (mode != AP_MODE_READBYTES && mode != AP_MODE_GETLINE) {
>          return ap_get_brigade(f->next, b, mode, block, readbytes);
>      }
>
>      if (!ctx) {
> -        const char *tenc, *lenp;
> -        f->ctx = ctx = apr_palloc(f->r->pool, sizeof(*ctx));
> -        ctx->state = BODY_NONE;
> -        ctx->remaining = 0;
> -        ctx->limit_used = 0;
> -        ctx->eos_sent = 0;
> +        f->ctx = ctx = apr_pcalloc(f->c->pool, sizeof(*ctx));
> +    }

This is subject to the enum problem discussed above.  I'd also
think the calloc might be too expensive, but I don't know for
sure (depends how many fields must be cleared).

>
> -        /* LimitRequestBody does not apply to proxied responses.
> -         * Consider implementing this check in its own filter.
> -         * Would adding a directive to limit the size of proxied
> -         * responses be useful?
> +    switch (ctx->state) {
> +    case HIF_READ_HEADER_FIELDS:
> +    {
> +        /* Read and parse the headers.
> +         * Begin by fetching and reading the request line
>           */
> -        if (!f->r->proxyreq) {
> -            ctx->limit = ap_get_limit_req_body(f->r);
> +        rv = ap_get_brigade(f->next, b, mode, block, readbytes);
> +        if (rv != APR_SUCCESS) {
> +            return rv;
>          }

You can't use the mode, block, or readbytes that were passed in.
You are fully expecting this to be a blocking operation, but if
this filter gets called on a non-blocking read, bad things are going
to happen.  Look at how the current filter reads chunks - it uses
AP_MODE_GETLINE and AP_MODE_BLOCK always for the protocol code.
You must do the same.

> -        else {
> -            ctx->limit = 0;
> +        if (ctx->b && !APR_BRIGADE_EMPTY(ctx->b)) {
> +            APR_BRIGADE_PREPEND(b, ctx->b);
> +        }
> +        rv = read_request_line(f->next, r, b, mode);
> +        if (rv != APR_SUCCESS) {
> +            return rv;
> +        }
> +        rv = read_header_fields(f->next, r, b, mode);
> +        if (rv != APR_SUCCESS) {
> +#if 0
> +            /* Handle errors at the top of the filter chain? */
> +            if (r->status != HTTP_REQUEST_TIME_OUT) {
> +                ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
> +                              "request failed: error reading the
> headers"); +                ap_send_error_response(r, 0);
> +                ap_run_log_transaction(r);
> +            }
> +#endif

I believe the strategy we have taken is that protocol errors
should be passed to the output filters directly.  The application
(caller) of this filter has no idea what to do in an error.
I think maintaining that approach would be a good thing.

> +    case HIF_READ_BODY_CHUNKED_SEND_100:

What exactly is this state for?  Given that at your line 1340, you
send the 100.  I think this is much more appropriately called
HIF_READ_BODY_CHUNK_LENGTH or something.  But, I don't get where the
SEND_100 comes in.  It's already done by the time this state enters.

> +        if (!ctx->remaining) {
> +            /* Handle trailers by calling ap_get_mime_headers 
again! */
> +            ctx->state = HIF_BODY_NONE;
> +/*             ap_get_mime_headers(f->r); This is broken */
> +            e = apr_bucket_eos_create(f->c->bucket_alloc);
> +            APR_BRIGADE_INSERT_TAIL(b, e);
> +            ctx->state = HIF_EOS_SENT;
> +            return APR_SUCCESS;

This should enter the READ_HEADER_FIELDS state again, correct?
Or, call read_header_fields or something.

>          }
> +        ctx->state = HIF_READ_BODY_CHUNKED;
> +        /* The break; is intentionally left out */
>      }
> +    case HIF_READ_BODY_CHUNKED:
> +    {
> +        /* This is still broken.... */
> +        if (!ctx->remaining) {
> +            char line[30];
> +            apr_bucket_brigade *bb;
> +            apr_size_t len = 30;
> +            apr_off_t brigade_length;

To me, this seems like we should now enter
HIF_READ_BODY_CHUNKED_SEND_100 (or whatever it should be called).

> Index: server/protocol.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/server/protocol.c,v
> retrieving revision 1.127
> diff -u -r1.127 protocol.c
> --- server/protocol.c	3 Feb 2003 17:53:19 -0000	1.127
> +++ server/protocol.c	24 Feb 2003 17:03:21 -0000
> @@ -923,117 +924,16 @@
> ...
> +    r->connection->r = r;
> +    bb = apr_brigade_create(r->pool, r->connection->bucket_alloc);
> +    rv = ap_get_brigade(r->input_filters, bb, AP_MODE_READBYTES,
> +                        APR_BLOCK_READ, HUGE_STRING_LEN);
> +    apr_brigade_destroy(bb);

Um, what?  I think this is how you are trying to get around when to
call the input filter.  This is extremely bogus.  You can't read
data and then throw it away.  And, no, setaside isn't the answer
(sorry, OtherBill).

In order to even contemplating merging this, I would like to see
a clear plan on how to support 'Upgrade' in HTTP/1.1.  I imagine
in the lifetime of 2.1/2.2, we will have a new HTTP protocol.  If
your solution doesn't allow for Upgrade, it'll have to be thrown
away later.  -- justin

Mime
View raw message