Return-Path: Delivered-To: apmail-httpd-test-dev-archive@www.apache.org Received: (qmail 65296 invoked from network); 21 Nov 2003 19:06:03 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 21 Nov 2003 19:06:03 -0000 Received: (qmail 10886 invoked by uid 500); 21 Nov 2003 19:05:47 -0000 Delivered-To: apmail-httpd-test-dev-archive@httpd.apache.org Received: (qmail 10854 invoked by uid 500); 21 Nov 2003 19:05:47 -0000 Mailing-List: contact test-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: test-dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list test-dev@httpd.apache.org Received: (qmail 10820 invoked from network); 21 Nov 2003 19:05:47 -0000 Received: from unknown (HELO atlrel8.hp.com) (156.153.255.206) by daedalus.apache.org with SMTP; 21 Nov 2003 19:05:47 -0000 Received: from xatlrelay1.atl.hp.com (xatlrelay1.atl.hp.com [15.45.89.190]) by atlrel8.hp.com (Postfix) with ESMTP id 4E8C61C01202; Fri, 21 Nov 2003 14:05:51 -0500 (EST) Received: from xatlbh2.atl.hp.com (xatlbh2.atl.hp.com [15.45.89.187]) by xatlrelay1.atl.hp.com (Postfix) with ESMTP id 3C8621C009D9; Fri, 21 Nov 2003 14:05:51 -0500 (EST) Received: by xatlbh2.atl.hp.com with Internet Mail Service (5.5.2655.55) id ; Fri, 21 Nov 2003 14:05:50 -0500 Message-ID: <304BDB72275BBB4DA590832B55A2029147277B@xsun04.ptp.hp.com> From: "MATHIHALLI,MADHUSUDAN (HP-Cupertino,ex1)" To: "'test-dev@httpd.apache.org'" Cc: dev@httpd.apache.org Subject: RE: Regarding Apache 2.0.48 and specweb99 Date: Fri, 21 Nov 2003 14:05:31 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2655.55) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N >-----Original Message----- >From: gregames@apache.org [mailto:gregames@apache.org] [SNIP] >cgid should _never_ exit without something in the error log. >That makes it >sound like a core problem, i.e. ap_process_child_status() or a >signal handler is >fubar, in addition to whatever made the cgi daemon die. > >But if that is in fact happening, I would trace syscalls & >signals for the cgid >process. (can't remember what the HPUX trace program is >called, but you want >something similar to truss/strace) To give a background, here's what I did : 1. Use a large timeout and keepalive timeout, and 100 threads / process. 2. Use HTTP/1.0 as the SPECweb99 client seems to have some problem with HTTP/1.1 (not much work done there) 3. Start SPECweb99 run nabled all the different dynamic tests - DYNAMIC_CONTENT, DYNAMIC_POST, DYNAMIC_CAD_GET, DYNAMIC_CGI_GET. - The run came back with a lot of "Can't connect" errors It's probably okay because some config was probably screwed up 4. The stipulated 20 min. warmup and the 20 min run happens The results are NOT posted even after 30 minutes 5. I get suspicious, and I try to do a simple GET to Apache - realized that Apache was hung. (telnet localhost 80... GET /foo etc stuff) 6. Attached gdb to each of the process - and found that a couple of processes were processing do_post (in mod_specweb99) and NO cgid process (YES - I backported Jeff's patch to restart cgid) !! I tried attaching tusc to the cgi daemon - but since the daemon dies at a random time, my log file was getting too full, and I had to just stop it. I tried resetting the log a couple of times - but then I got diverted and started thinking in a diffent angle : were the timeouts were too long, is the system was running out of sockets etc. I can reproduce the problem every single run of SPECweb99 (history: 2.0.43 ran just fine). I'll try to get the tusc for cgid when it dies - to see if it helps. BTW, one more thing I noticed : there's some problem with keeping the sockets alive for a long time. The SPECweb99 client logs a error on close socket (EBADF) when the keepalive times out. -Madhu