Return-Path: Delivered-To: new-httpd-archive@hyperreal.org Received: (qmail 4230 invoked by uid 6000); 12 Apr 1999 19:58:32 -0000 Received: (qmail 4147 invoked from network); 12 Apr 1999 19:58:30 -0000 Received: from www.freepornpost.com (HELO hades.riverstyx.net) (unknown@216.94.42.241) by taz.hyperreal.org with SMTP; 12 Apr 1999 19:58:30 -0000 Received: from localhost (unknown@localhost) by hades.riverstyx.net (8.9.3/8.9.3) with ESMTP id MAA23180 for ; Mon, 12 Apr 1999 12:58:46 -0700 Date: Mon, 12 Apr 1999 12:58:46 -0700 (PDT) From: To: new-httpd@apache.org Subject: Re: apache-apr In-Reply-To: <19990412025949.A14370@io.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: new-httpd-owner@apache.org Precedence: bulk Reply-To: new-httpd@apache.org I dropped the MaxRequestsPerChild down to 5 so the server'd die even quicker. I went and found the process that was in the state you described, and here's what I got: [root@hades apache-apr]# ps fax|grep httpd|awk '{print $1}'|perl -ne 'chop; system "lsof -p $_";'|grep wW httpd-apr 15359 root 53wW REG 3,5 0 122969 /usr/local/apache-apr/logs/accept.lock.1 (deleted) [root@hades apache-apr]# gdb httpd-apr 15359 GNU gdb 4.17.0.4 with Linux/x86 hardware watchpoint and FPU support Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /usr/local/apache-apr/15359: No such file or directory. Attaching to program `/usr/local/apache-apr/httpd-apr', Pid 15359 Reading symbols from /lib/libm.so.6...done. Reading symbols from /lib/libcrypt.so.1...done. Reading symbols from /lib/libdb.so.2...done. Reading symbols from /lib/libpthread.so.0...wdone. Reading symbols from /lib/libc.so.6...hdone. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.1...edone. Reading symbols from /lib/libnss_nis.so.1...rdone. Reading symbols from /lib/libnsl.so.1...done. Reading symbols from /lib/libnss_dns.so.1...done. Reading symbols from /lib/libresolv.so.2...done. 0x400e55c2 in __libc_accept () (gdb) where #0 0x400e55c2 in __libc_accept () #1 0x4006c0c1 in accept (fd=16, addr={__sockaddr__ = 0xbd5ffd48, __sockaddr_at__ = 0xbd5ffd48, __sockaddr_ax25__ = 0xbd5ffd48, __sockaddr_dl__ = 0xbd5ffd48, __sockaddr_eon__ = 0xbd5ffd48, __sockaddr_in__ = 0xbd5ffd48, __sockaddr_in6__ = 0xbd5ffd48, __sockaddr_inarp__ = 0xbd5ffd48, __sockaddr_ipx__ = 0xbd5ffd48, __sockaddr_iso__ = 0xbd5ffd48, __sockaddr_ns__ = 0xbd5ffd48, __sockaddr_un__ = 0xbd5ffd48, __sockaddr_x25__ = 0xbd5ffd48}, addr_len=0xbd5ffd44) at wrapsyscall.c:146 #2 0x80768ed in accept_thread () #3 0x40069357 in pthread_start_thread (arg=0xbd5ffea4) at manager.c:192 (gdb) quit After that, I checked with lsof again and found two threads in that condition. A second later, the older of the two threads terminated, but I had time to check the remaining thread... it was in the exact same condition. I kept checking, got the same results... Here's a backtrace from another thread in its process. Not sure if I caught it while it was still stuck tho... I'm pretty sure I did, 'coz this is the same result I got from checking four other threads, which just previously I had verified as having a stuck thread. (gdb) where #0 0x400e5164 in __syscall_sigsuspend () #1 0x401096cc in __DTOR_END__ () #2 0x40068513 in pthread_cond_wait (cond=0x80a0c44, mutex=0x80a0c2c) at restart.h:49 #3 0x808724f in queue_pop () #4 0x8076a2e in worker_thread () #5 0x40069357 in pthread_start_thread (arg=0xbf7ffea4) at manager.c:192 --- tani hosokawa river styx internet On Mon, 12 Apr 1999, Manoj Kasichainula wrote: > I'm not surprised that you're seeing a process in this fcntl lock > condition. As Ryan noted, this is a known problem which needs to be > fixed. But, I don't think it should show up if every port the server > listens to is hit by a client on a regular basis. Is this the case? > > I would be surprised if all of the processes were in this fcntl > locking state. The reason that this process is still waiting on the > fcntl lock is that another process has the fcntl lock. And that > process with the lock is supposed to be accepting. Are you actually > seeing every thread in one of the states you described? If not, don't > check yet, that's too much work. > > If you have lsof, do the following (which can probably be automated): > > 1. Get lsof. > 2. run lsof on the main thread of each process, and look under the > "FD" column for a capital W after a lower-case w, like this: > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > (...) > httpd 568 manoj 17wW REG 3,6 0 16411 /home (/dev/hda6) > ^ > The important part_____| > > Just running lsof on every httpd thread and grepping for a "W" or "wW" > should make this easy. > > 3. pull up a gdb backtrace of this thread, assuming there is one. It > is supposed to be accepting requests, but it is probably not. What is > it doing instead? > > 4. If you can get a backtrace of the other threads of that process > (pstree -p will tell you which other threads are in that process), > that would be great, too. > > Thanks for your help. >