httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Hartill <>
Subject Re: (fwd) httpd(Apache) lockup prob. *please* help? (fwd)
Date Wed, 11 Sep 1996 09:35:15 GMT

interesting observation...

----- Forwarded message from Homer W. Smith -----

Newsgroups: comp.infosystems.www.servers.unix
Date: Tue, 10 Sep 1996 21:29:10 -0400 (EDT)
From: "Homer W. Smith" <>
To: Max Parke <>
Subject: Re: (fwd) httpd(Apache) lockup prob. *please* help?
In-Reply-To: <>
Message-ID: <>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

    Running SunOS 4.1.4 Stronghold 1.1.1

    Turning off Listen 80 stopped the load spiking.

    Child httpds are being put permanently to sleep by a number
of events, one of which is a tcp/ip connection is opened to it
and it never receives a GET command.  It is very easy to 
replicate this with a perl script that opens tcp/ip processes
to the server and simply not sending a GET.
    The child goes to sleep and may or may not wake up according
to the timeout that is set.

    The primary question I see is this.
    I believe we understand now why a child hangs and goes to sleep.
    One reason is someone opens a connection to a child and fails
to send a GET.  There are other reasons less clear.

    What we don't understand is why the root process stops spawning
children.  It doesn't matter how many children go to sleep forever, as
long as the root server continues to spawn new children, the swap will
fill up but the server will continue to work.

     However when the root server hangs, and all extant children go to
sleep from the client's above failure to send a GET or other reasons, then
new hits coming in are not responded to. 

    It is very possible for a user who is hanging children to
hang all available children once new children stop being
spawned.  He just keeps hitting on the server with the hanger application,
and they all get hung one at a time until no more are left.
    I am not sure how it works, does the child process pick up the
incoming hit and tell the root process that it is now busy so the root
process can spawn another child for the next hit, or does the root process
pick up the incoming hit, pass it off to the child and spawn another child
for the next possible hit?   In either case children stopped being
spawned and all available children go to sleep.

    Why does the Apache Status page show that the root server is
serving pages, and why are there multiple entries for the root server?

Homer Wilson Smith     News, Web, Telnet      Art Matrix - Lightlink
(607) 277-0959         SunOS 4.1.4 Sparc 20   Internet Access, Ithaca NY

On Tue, 10 Sep 1996, Max Parke wrote:

> Hopefully some kind soul will help us..............
> Path:!light!mhp
> From: (Max Parke)
> Newsgroups: comp.unix.programmer
> Subject: httpd(Apache) lockup prob. *please* help?
> Date: 10 Sep 1996 23:16:03 GMT
> Organization: ART MATRIX - LIGHTLINK
> Lines: 41
> Message-ID: <514srj$>
> NNTP-Posting-Host:
> X-Newsreader: TIN [version 1.2 PL2]
> Running Apache 1.1.1 under SunOS 4.1.4.  System runs fine for several days but
> then "hangs"  as follows.  *Any* tips or help would be _much_ appreciated!!
> When the hang occurs all of the child processes show on ``top'' as swapped 
> out.  (I.e., 0K Resident Set Size, RSS)
> Anyone attempting to connect to the server at that point just sits there 
> waiting.   Can see no obvious error messages in the Apache logfiles or kernel
> logs.  
> In one case the system "freed up" (by itself) after nine minutes.
> On today's hit, we gcore'd all of the child processes as well as the main
> (parent) process.  All of the child processes (except one) were waiting on
> the accept() call.  The other was waiting on a read() - I don't think this
> is significant.   The parent process was in its usual wait_or_timeout()
> routine.
> Apache contains an optional "feature" (SERIALIZED_ACCEPT - in two flavors) 
> which we are _not_ using.  So, each of the children is doing its 
> own accept().
> My question is, what would cause the kernel to swap out several children,
> each waiting on an accept(), for nine minutes, when we know that there are
> incoming connects waiting to be serviced???
> Is there something perhaps about all of the processes belonging to the same
> process group or some other factor which all share in common that causes
> them all to be "put to sleep" at once?
> Why would everything "magically" free up after several minutes??
> Is there anything else I should be looking for?
> Is there any other tracing/dumping I can do [we don't have the kernel
> source code to look at, unfortunately, or I would dump the kernel and
> see why IT thinks the processes are swapped.]
> Hope I haven't left anything out.  Thanks for reading this far
> Max <>

----- End of forwarded message from Homer W. Smith -----

Rob Hartill (  ... why wait for a clear night to see the stars?.

View raw message