httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject DO NOT REPLY [Bug 7617] New: - Apache 1.3.x race condition causes gratuitous 3-second CGI delay
Date Fri, 29 Mar 2002 19:58:37 GMT

Apache 1.3.x race condition causes gratuitous 3-second CGI delay

           Summary: Apache 1.3.x race condition causes gratuitous 3-second
                    CGI delay
           Product: Apache httpd-1.3
           Version: 1.3.24
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: mod_cgi

This is a repost of a bug that I reported to list in 2001. 
Since that forum is usually primarily concerned with development of Apache 2, I 
am open this as a Bugzilla bug. The bug never made it into the former Apache 
bugtracking system, although it did have some similarities to some (VERY old) 
existing bug reports for various architectures.

There is an apparent race condition in Apache 1.3.x CGI handling which results 
in occasional unnecessary 3-second delays resulting from a pause between when a 
CGI child process closes it output pipe and when that process subsequently 
exits. Under normal circumstances, it appears that only Solaris x86 is majorly 

Specifically, the code in mod_cgi.c reads from its child process until the 
child process breaks the pipe. The cleanup code in alloc.c then calls waitpid() 
with WNOHANG to check to see if its child process has died; if its pid is not 
waiting, Apache assumes that the process has hung. It sends a SIGTERM, waits 3 
seconds, then sends a SIGKILL. The relevant code is in free_proc_chain() in 

The former assumption (if the child pid is not waiting to be reaped, the child 
process must have hung and should be killed) appears to be erroneous on at 
least some configurations. Specifically, imagine that the CGI child process 
exits 10ms after the cleanup code in alloc.c is run. In this case, the Apache 
process sleeps 3 seconds, when really it didn't need to.

This problem is only client-visible with HTTP/1.1 keep-alive, Apache running as 
a single process, or bad luck where the client talks to the same child more 
than once. The user-visible symptom is then a 3-second delay following a CGI 
request, before the next request is serviced.

To try to reproduce the problem:

    * Build Apache "out of the box" with a straight configure

    * Enable .cgi processing. Here is the diff between the default config
      file and the one with .cgi processing enabled:

          <     Options Indexes FollowSymLinks MultiViews
          >     Options Indexes FollowSymLinks MultiViews ExecCGI
          <     #AddHandler cgi-script .cgi
          >     AddHandler cgi-script .cgi

    * Put a test CGI under the default DocumentRoot. Here is one that
      explicitly triggers the bug:

          # break.cgi - triggers the 3-second delay on any system
          print "Content-Type: text/plain\n\n";
          print "Hello, world.\n";
          close STDOUT;
          sleep 1;

      And here is one that should NOT trigger the bug purposely, but still
      exhibits problems on our Solaris x86 systems:

          # test.cgi - on Solaris x86, sometimes exhibits 3-second delay
          print "Content-Type: text/plain\n\n";
          print "Hello, world.\n";

    * Connect to the HTTP server via telnet, and make a Keep-Alive request.
      Repeat the request after getting a response. With break.cgi, you should
      see a 3-second delay after every response. With test.cgi on an affected
      system, the 3-second delay occurs regularly but sporadically.

On Solaris x86 on a dual-processor box, we see this behavior perhaps 10-20% of 
the time for any particular child (using the test.cgi case above). On most 
other systems we tested, you have to explicitly try to trigger the bug (for 
example, using the break.cgi above).

We're not sure why Solaris x86 exhibits the delay even without a forced delay 
between pipe closing and process exit. Perhaps Solaris is doing some cleanup 
that Linux is not, or there is some child reaping issue with the multiple 

Here are the configurations we tested. Patched Apaches (with mod_perl or 
mod_ssl capabilities) had the same behaviors as straight out-of-the-box 
configurations; having DSOs enabled was likewise irrelevant.

    * Solaris x86, dual processor Intel boxes, Apache 1.3.9, 1.3.1[247], 1.3.24
        * On Apache 1.3.14, mod_perl and mod_ssl and non-DSO variants
        * All configurations display sporadic 3-second CGI delays
          even in a simple Hello, world CGI.
    * Solaris on a single processor Sparc box, Apache 1.3.12, 1.3.24;
      Linux, single processor Intel boxes, Apache 1.3.12, 1.3.14;
      FreeBSD, dual processor Intel box, Apache 1.3.12;
      OpenBSD, single process Intel box, Apache 1.3.12
        * Without explicitly closing STDOUT, the bug doesn't appear,
          but if you close STDOUT and do really anything at all
          (including just a timing loop), the bug appears

I will attach my test script, a simple Perl script that opens a socket 
connection to a webserver and does repeated HTTP/1.1 Keep-Alive requests, 
timing each trial, to this bug. It vastly simplifies the last step in the repro 
case above.

This bug may be the same as PR 6961 (repeated requests for a simple cgi invoke 
delay of Apache) and is related loosely to PR 6226 (closing STDOUT doesn't end 
session to allow background processing of code). I also originally sent an e-
mail to about this, which came up with a couple followups. 
The URL to that in the archive is here:

There was a very short discussion (apparently this problem has a bit of a 
history!) but no resolution.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message