httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Pane <bri...@apache.org>
Subject Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c
Date Tue, 03 Jan 2006 09:21:50 GMT

On Jan 2, 2006, at 3:41 PM, Justin Erenkrantz wrote:

>> +static apr_status_t process_request_line(request_rec *r, char *line,
>> +                                         int skip_first)
>> +{
>> +    if (!skip_first && (r->the_request == NULL)) {
>> +        /* This is the first line of the request */
>> +        if ((line == NULL) || (*line == '\0')) {
>> +            /* We skip empty lines because browsers have to tack  
>> a CRLF
>> on to the end +             * of POSTs to support old CERN  
>> webservers.
>> +             */
>> +            int max_blank_lines = r->server->limit_req_fields;
>> +            if (max_blank_lines <= 0) {
>> +                max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS;
>> +            }
>> +            r->num_blank_lines++;
>> +            if (r->num_blank_lines < max_blank_lines) {
>> +                return APR_SUCCESS;
>> +            }
>> +        }
>> +        set_the_request(r, line);
>> +    }
>
> This will cause a segfault if line is null and we are at or above  
> max_blank_lines.  Perhaps you meant for an else clause here?

Yes, thanks for catching that.

>
>> +    else {
>> +        /* We've already read the first line of the request.   
>> This is
>> either +         * a header field or the blank line terminating  
>> the header
>> +         */
>> +        if ((line == NULL) || (*line == '\0')) {
>> +            if (r->pending_header_line != NULL) {
>> +                apr_status_t rv = set_mime_header(r,
>> r->pending_header_line); +                if (rv != APR_SUCCESS) {
>> +                    return rv;
>> +                }
>> +                r->pending_header_line = NULL;
>> +            }
>> +            if (r->status == HTTP_REQUEST_TIME_OUT) {
>> +                apr_table_compress(r->headers_in,
>> APR_OVERLAP_TABLES_MERGE);
>> +                r->status = HTTP_OK;
>
> Say what?  ...looks at rest of file...
>
> Is this because r->status is unset and we're saying that's it's  
> 'okay' to proceed with the request.  If so, this *really* needs a  
> comment to that effect.  It makes no sense whatsoever otherwise.   
> (We should probably remove the hack to set it to  
> HTTP_REQUEST_TIME_OUT initially as part of these changes.)

Yeah, setting r->status to HTTP_OK is done here solely to make it work
with the existing logic about HTTP_REQUEST_TIME_OUT meaning "still
reading the request header."

+1 for of removing the HTTP_REQUEST_TIME_OUT hack.  I was trying
to be conservative by preserving that part of the original logic, but  
now
that you mention it, we might as well replace it with something less  
subtle
in 2.3.  This will break any 3rd party modules that depend upon the
HTTP_REQUEST_TIME_OUT convention, but for a major release
like 2.4 or 3.0 that's a defensible choice.

>> +            }
>> +        }
>> +        else {
>> +            if ((*line == ' ') || (*line == '\t')) {
>> +                /* This line is a continuation of the previous  
>> one */
>> +                if (r->pending_header_line == NULL) {
>> +                    r->pending_header_line = line;
>> +                    r->pending_header_size = 0;
>> +                }
>> +                else {
>> +                    apr_size_t pending_len =
>> strlen(r->pending_header_line);
>> +                    apr_size_t fold_len = strlen(line);
>
> This seems really expensive.  You shouldn't need to recount  
> pending_len each time.

Good point; I'll add something to keep track of the length.

>> +            }
>> +            break;
>
> If I understand your direction, it'd bail out here if it ever got  
> EAGAIN?

Yes indeed.  That's what the version on the async-read-dev branch does.

>> +request_rec *ap_read_request(conn_rec *conn)
>> +{
>> +    request_rec *r;
>> +    apr_status_t rv;
>> +
>> +    r = init_request(conn);
>> +
>> +    rv = read_partial_request(r);
>> +    /* TODO poll if EAGAIN */
>> +    if (r->status != HTTP_OK) {
>> +        ap_send_error_response(r, 0);
>> +        ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
>> +        ap_run_log_transaction(r);
>> +        return NULL;
>> +    }
>
> Obviously, this can't be the case if you want to do real polling.   
> This would be the wrong place to poll.  You have to exit out of  
> ap_read_request and go back up to an 'event thread' that waits  
> until there's data to read on any incoming sockets.  Then, you'd  
> have to call ap_read_request again to 'restart' the parser whenever  
> there is more data to read.
>
> In my opinion, this code isn't even close to being async.

I'm relieved to hear that it's not async, since you're looking at the  
blocking
version.  :-)  See ap_read_async_request() (still on the async-read- 
dev branch).

>   So, I wonder why it was even merged to trunk right now.  You'd  
> have to deal with partial lines and the state the header parser is  
> in when it gets the next chunk of data - which this code doesn't  
> even try to do.  The current code is just going to bail when it  
> doesn't have a 'complete' line.
>
> My hunch is that you plan to build up pending_header_line and delay  
> parsing until you have the line terminators; but that's not going  
> to work really well with non-blocking sockets as you have to store  
> the data you just read non-blocking'ly if you actually read  
> multiple MIME header lines or, icky, even part of the request body  
> or the next request.  This just lends more credence to the argument  
> that request_rec is the wrong place for the buffer.  It's clearly  
> connection or connection-filter oriented...

The async impl addesses this problem by only grabbing a line at a  
time from
the input filter chain.  The filters are responsible for buffering  
anything that's been
read from the socket but not consumed by the current request.  If you  
have a
chance to review the async code (the version on the branch), I'm  
eager to get
some feedback on that part of the design.

> (IIRC, another point that'll kill httpd is that httpd mis-uses  
> EAGAIN; serf has a definition that EAGAIN means some data is read  
> *and* you need to ask for more.  httpd's filters don't have that.   
> One more issue that httpd got wrong...)
>
> To me, it sounds like we need a way for the MPM (or whatever) to  
> redefine the accept logic instead of relying upon the core.  Oh,  
> yah, we do that already (heh) by allowing them to call something  
> other than ap_process_connection: such as with  
> ap_process_http_async_connection(). So, why are we even touching  
> ap_read_request() at all?  This is going to necessitate an  
> ap_async_read_request() anyway...

Agreed on the need for async_read_request(); I purposely created that  
as a separate function,
following the pattern set by ap_process_http_async_connection(), so  
that things depending on
the semantics of the original ap_read_request() wouldn't break.  The  
bulk of the refactoring
that I've recently checked into the trunk is designed to allow the  
sync and async versions of
ap_read_request() to share code.  Given that the mime-folding code  
has been a source of
security vulnerabilities in the past, I'm trying to stick with a  
single impl of the header parsing
code that the sync and async code can both use.

Brian


Mime
View raw message