httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ho <and...@tellme.com>
Subject Apache 1.3.x race condition causes gratuitous 3-second CGI delay
Date Wed, 14 Feb 2001 21:24:00 GMT
Hello,

We've discovered an apparent race condition in Apache 1.3.x CGI handling
which results in occasional unnecessary 3-second delays resulting from a
pause between the closing of the CGI child process' pipe and that process'
subsequent exit.

Specifically, the code in mod_cgi.c reads from its child process until the
child process breaks the pipe. The cleanup code in alloc.c then calls
waitpid with WNOHANG to check to see if its child process has died; if its
pid is not waiting, Apache assumes that the process has hung. It sends a
SIGTERM, waits 3 seconds, then sends a SIGKILL. The relevant code is in
free_proc_chain() in alloc.c.

The former assumption (if the child pid is not waiting to be reaped, the
child process must have hung and should be killed) appears to be erroneous
on at least some configurations. Specifically, imagine that the CGI child
process exits 10ms after the cleanup code in alloc.c is run. In this case,
the Apache process sleeps 3 seconds, when really it didn't need to.

This problem is only client-visible with HTTP/1.1 keep-alive, Apache
running as a single process, or bad luck where the client talks to the
same child more than once. The user-visible symptom is then a 3-second
delay following a CGI request.

On Solaris x86 on a dual-processor box, we see this behavior perhaps 1/5
of the time for any particular child. On most other systems we tested,
you have to explicitly try to trigger the bug. This CGI will do it:

    #!/usr/local/bin/perl
    print "Content-type: text/plain\n\n";
    print "Hello, world.\n";
    close STDOUT;
    sleep 1;  # or anything else that does a system call, really
    exit 0;

Instead of delaying 1 second, the child displays a 3 second delay after
running this CGI. Note that on our dual-processor Solaris x86 boxes, even
just the simple Hello, world (no premature closing of STDOUT or sleeping)
exhibits the 3 second delay regularly. I'm not sure why this is; perhaps
Solaris is doing some cleanup that Linux is not, or there is some child
reaping issue with the multiple processors.

Here are the configurations we tested. Patched Apaches (with mod_perl
or mod_ssl capabilities) had the same behaviors as straight out-of-the-box
configurations; having DSOs enabled was likewise irrelevant.

    * Solaris x86, dual processor Intel boxes, Apache 1.3.9, 1.3.1[247]
        * On Apache 1.3.14, mod_perl and mod_ssl and non-DSO variants
        * All configurations display sporadic 3-second CGI delays
          even in a simple Hello, world CGI.
    * Linux, single processor Intel boxes, Apache 1.3.12, 1.3.14
        * Without explicitly closing STDOUT, the bug doesn't appear,
          but if you close STDOUT and do really anything at all
          (including just a timing loop), the bug appears
    * Solaris on a single processor Sparc box, Apache 1.3.12
    * FreeBSD, dual processor Intel box, Apache 1.3.12
    * OpenBSD, single process Intel box, Apache 1.3.12

I've appended my test script below, it is a simple Perl script that opens
a socket connection to a webserver and does repeated HTTP/1.1 Keep-Alive
requests, timing each trial.

This bug may be the same as PR 6961 (repeated requests for a simple cgi
invoke delay of Apache) and is related loosely to PR 6226 (closing STDOUT
doesn't end session to allow background processing of code). What's the
best thing for me to do--file a new PR with this information?

Thanks in advance--

Humbly,

Andrew

----------------------------------------------------------------------
Andrew Ho               http://www.tellme.com/       andrew@tellme.com
Engineer                   info@tellme.com          Voice 650-930-9062
Tellme Networks, Inc.       1-800-555-TELL            Fax 650-930-9101
----------------------------------------------------------------------




#!/usr/local/bin/perl -w
# ========================================================================
# kahit - hit an HTTP/1.1 webserver over and over again in Keep-Alive mode
# Andrew Ho (andrew@tellme.com)
#
# This script opens a network connection to a webserver, and issues an
# HTTP request that specifies Keep-Alive. It issues that request over and
# over again on the same connection, exiting when the server side closes
# the connection explicitly. It understands both Content-Length and
# chunked Transfer-Encoding, and respects the Connection header.
#
# Last modified February 14, 2001
# ========================================================================
use strict;

use Socket;
use Time::HiRes qw(gettimeofday tv_interval);

use vars qw($HOST $PORT $PATH);
$HOST = 'sparky.tellme.com';
$PORT = 80;
$PATH = '/~andrew/break.cgi';

use vars qw($CRLF);
$CRLF = "\015\012";

use vars qw($VERBOSE);
$VERBOSE = 0;


# ------------------------------------------------------------------------
# Set up network connection

my $proto = getprotobyname('tcp');

socket(CLIENT, PF_INET, SOCK_STREAM, $proto) || die "socket error: $!";
my $addr = inet_aton($HOST)                  || die "could not resolve $HOST";
my $paddr = sockaddr_in($PORT, $addr);

connect(CLIENT, $paddr)                      || die "connect error: $!";


# ------------------------------------------------------------------------
# Main loop

$SIG{INT} = $SIG{TERM} = sub { exit };    # trigger END block on TERM
printf "----started, %s----\n", scalar localtime;

my $eof;
while (!$eof) {
    printf "----begin, %s----\n", scalar localtime;
    my $begin = [ gettimeofday ];

    # Output HTTP request line and Connection header
    print CLIENT 'GET ', $PATH, ' HTTP/1.1', $CRLF;
    print        'GET ', $PATH, ' HTTP/1.1', "\n" if $VERBOSE;
    print CLIENT 'Host: ', $HOST, $CRLF;
    print        'Host: ', $HOST, "\n" if $VERBOSE;
    print CLIENT 'Connection: keep-alive', $CRLF;
    print        'Connection: keep-alive', "\n" if $VERBOSE;
    print CLIENT $CRLF;
    print "\n" if $VERBOSE;

    # Flush output to both server and terminal
    my $fh = select CLIENT;
    local $| = 1;
    select $fh;
    local $| = 1 if $VERBOSE;

    # Read response headers
    my($content_length, $chunked);
    while(<CLIENT>) {
        # Respect EOF and Connection: close header
        do { $eof = 1; last } if !defined $_ || /^connection\:.*close/i;
        $content_length = $1 if /^content\-length\:\s*(\d+)\s*$/i;
        $chunked = 1 if /^transfer\-encoding\:.*chunked/i;
        print if $VERBOSE;
        last if /^\s*$/;
    }

    # Read response entity body
    if(defined $content_length) {
        # Got a Content-Length, so we can just read once

        # Content-length can be 0, in which case don't read
        if($content_length) {
            my $buffer;
            if($content_length != read CLIENT, $buffer, $content_length) {
                warn "did not read $content_length bytes";
            }
            print $buffer if $VERBOSE;
        }

    } elsif($chunked) {
        # Use chunked Transfer-Encoding (see RFC 2616, 19.4.6)

        # Read first chunk size
        my $chunk_size = <CLIENT>;
        do { $eof = 1; last } unless defined $chunk_size;
        print $chunk_size if $VERBOSE;
              $chunk_size =~ s/\s+$//;
              $chunk_size = hex $chunk_size;

        while($chunk_size > 0) {
            my $buffer;
            if($chunk_size != read CLIENT, $buffer, $chunk_size) {
                warn "did not read $chunk_size bytes";
            }
            print $buffer if $VERBOSE;

            # Read next chunk size (identical to code above)
            $chunk_size = <CLIENT>;
            do { $eof = 1; last } unless defined $chunk_size;
            print $chunk_size if $VERBOSE;
                  $chunk_size =~ s/\s+$//;
                  $chunk_size = hex $chunk_size;
        }

        # There can be more headers, or a CRLF here
        while(<CLIENT>) {
            do { $eof = 1; last } unless defined $_;
            print if $VERBOSE;
            last if /^\s*$/;
        }

    } else {
        # Hrm. Don't know how to handle this case

        warn "no Content-Length, and not chunked Transfer-Encoding\n";
        $eof = 1;

    }

    my $elapsed = tv_interval($begin, [ gettimeofday ]);
    printf "----end, %0.3f ms elapsed----\n", $elapsed * 1000;
}


# ------------------------------------------------------------------------
# Clean up

END {
    $SIG{INT} = $SIG{TERM} = 'IGNORE';
    close CLIENT;
    printf "----terminated, %s----\n", scalar localtime;
    exit 0;
}


# ========================================================================
__END__


Mime
View raw message