Return-Path: Delivered-To: apmail-httpd-bugs-archive@httpd.apache.org Received: (qmail 67503 invoked by uid 500); 29 Mar 2002 19:58:33 -0000 Mailing-List: contact bugs-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Reply-To: "Apache HTTPD Bugs Notification List" Delivered-To: mailing list bugs@httpd.apache.org Received: (qmail 67492 invoked from network); 29 Mar 2002 19:58:33 -0000 Date: 29 Mar 2002 19:58:37 -0000 Message-ID: <20020329195837.27341.qmail@nagoya.betaversion.org> From: bugzilla@apache.org To: bugs@httpd.apache.org Cc: Subject: DO NOT REPLY [Bug 7617] New: - Apache 1.3.x race condition causes gratuitous 3-second CGI delay X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7617 Apache 1.3.x race condition causes gratuitous 3-second CGI delay Summary: Apache 1.3.x race condition causes gratuitous 3-second CGI delay Product: Apache httpd-1.3 Version: 1.3.24 Platform: Sun OS/Version: Solaris Status: NEW Severity: Normal Priority: Other Component: mod_cgi AssignedTo: bugs@httpd.apache.org ReportedBy: andrew@tellme.com This is a repost of a bug that I reported to dev@httpd.apache.org list in 2001. Since that forum is usually primarily concerned with development of Apache 2, I am open this as a Bugzilla bug. The bug never made it into the former Apache bugtracking system, although it did have some similarities to some (VERY old) existing bug reports for various architectures. There is an apparent race condition in Apache 1.3.x CGI handling which results in occasional unnecessary 3-second delays resulting from a pause between when a CGI child process closes it output pipe and when that process subsequently exits. Under normal circumstances, it appears that only Solaris x86 is majorly affected. Specifically, the code in mod_cgi.c reads from its child process until the child process breaks the pipe. The cleanup code in alloc.c then calls waitpid() with WNOHANG to check to see if its child process has died; if its pid is not waiting, Apache assumes that the process has hung. It sends a SIGTERM, waits 3 seconds, then sends a SIGKILL. The relevant code is in free_proc_chain() in alloc.c. The former assumption (if the child pid is not waiting to be reaped, the child process must have hung and should be killed) appears to be erroneous on at least some configurations. Specifically, imagine that the CGI child process exits 10ms after the cleanup code in alloc.c is run. In this case, the Apache process sleeps 3 seconds, when really it didn't need to. This problem is only client-visible with HTTP/1.1 keep-alive, Apache running as a single process, or bad luck where the client talks to the same child more than once. The user-visible symptom is then a 3-second delay following a CGI request, before the next request is serviced. To try to reproduce the problem: * Build Apache "out of the box" with a straight configure * Enable .cgi processing. Here is the diff between the default config file and the one with .cgi processing enabled: 317c317 < Options Indexes FollowSymLinks MultiViews --- > Options Indexes FollowSymLinks MultiViews ExecCGI 784c784 < #AddHandler cgi-script .cgi --- > AddHandler cgi-script .cgi * Put a test CGI under the default DocumentRoot. Here is one that explicitly triggers the bug: #!/usr/local/bin/perl # break.cgi - triggers the 3-second delay on any system print "Content-Type: text/plain\n\n"; print "Hello, world.\n"; close STDOUT; sleep 1; And here is one that should NOT trigger the bug purposely, but still exhibits problems on our Solaris x86 systems: #!/usr/local/bin/perl # test.cgi - on Solaris x86, sometimes exhibits 3-second delay print "Content-Type: text/plain\n\n"; print "Hello, world.\n"; * Connect to the HTTP server via telnet, and make a Keep-Alive request. Repeat the request after getting a response. With break.cgi, you should see a 3-second delay after every response. With test.cgi on an affected system, the 3-second delay occurs regularly but sporadically. On Solaris x86 on a dual-processor box, we see this behavior perhaps 10-20% of the time for any particular child (using the test.cgi case above). On most other systems we tested, you have to explicitly try to trigger the bug (for example, using the break.cgi above). We're not sure why Solaris x86 exhibits the delay even without a forced delay between pipe closing and process exit. Perhaps Solaris is doing some cleanup that Linux is not, or there is some child reaping issue with the multiple processors. Here are the configurations we tested. Patched Apaches (with mod_perl or mod_ssl capabilities) had the same behaviors as straight out-of-the-box configurations; having DSOs enabled was likewise irrelevant. * Solaris x86, dual processor Intel boxes, Apache 1.3.9, 1.3.1[247], 1.3.24 * On Apache 1.3.14, mod_perl and mod_ssl and non-DSO variants * All configurations display sporadic 3-second CGI delays even in a simple Hello, world CGI. * Solaris on a single processor Sparc box, Apache 1.3.12, 1.3.24; Linux, single processor Intel boxes, Apache 1.3.12, 1.3.14; FreeBSD, dual processor Intel box, Apache 1.3.12; OpenBSD, single process Intel box, Apache 1.3.12 * Without explicitly closing STDOUT, the bug doesn't appear, but if you close STDOUT and do really anything at all (including just a timing loop), the bug appears I will attach my test script, a simple Perl script that opens a socket connection to a webserver and does repeated HTTP/1.1 Keep-Alive requests, timing each trial, to this bug. It vastly simplifies the last step in the repro case above. This bug may be the same as PR 6961 (repeated requests for a simple cgi invoke delay of Apache) and is related loosely to PR 6226 (closing STDOUT doesn't end session to allow background processing of code). I also originally sent an e- mail to dev@httpd.apache.org about this, which came up with a couple followups. The URL to that in the archive is here: http://groups.yahoo.com/group/new-httpd/message/19853 There was a very short discussion (apparently this problem has a bit of a history!) but no resolution. --------------------------------------------------------------------- To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org For additional commands, e-mail: bugs-help@httpd.apache.org