httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Laurie <...@gonzo.ben.algroup.co.uk>
Subject Re: WWW Form Bug Report: "server hangs frequently" on NeXT (fwd)
Date Thu, 12 Dec 1996 09:26:08 GMT
Randy Terbush wrote:
> 
> ------- Blind-Carbon-Copy
> 
> To: gtf@cirp.org
> Subject: Re: WWW Form Bug Report: "server hangs frequently" on NeXT (fwd) 
> In-reply-to: robh's message of Thu, 12 Dec 1996 00:56:31 +0000.
>          <199612120056.AAA08257> 
> X-uri: http://www.zyzzyva.com/
> Mime-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Date: Wed, 11 Dec 1996 19:49:44 -0600
> From: Randy Terbush <randy@sierra.zyzzyva.com>
> 
> Geoffry,
> 
> Looks like the waitpid() that Apache is supplying may not be
> sophisticated enough to be used in this new code. You might
> try changing the #if at L967 of http_main.c to 0, or at the
> very least, pull the wait_or_timeout() function out of the
> section below.
> 
> This looks like some of Ben's code. Perhaps he could comment
> on it. I quickly glanced at the FreeBSD code to see if I could
> find a better waitpid() but struck out.

The main server is not "stuck in a loop" - that's where it is supposed to be.
However, the waitpid replacement in util.c clearly doesn't work when pid is set
to -1 (as it is in wait_or_timeout). BUT! NeXT doesn't use it - only Amdahl
UTS does. NeXT uses something called wait4, for which I have no docco. I'd
suggest that an interim solution would be to switch on BROKEN_WAIT for NeXT.
I'd like to see the docco for wait4, though.

We should probably document the fact that Amdahl UTS is busted, too. Or switch
on BROKEN_WAIT for that, too - with appropriate comments!

Note that the use of waitpid() in wait_or_timeout was introduced to avoid a
race condition, if I remember correctly, so don't be tempted to switch back to
a wait().

Ahem. A small snag, though - I've just noticed that BROKEN_WAIT is no longer
honoured. It may never have really been needed, though (it would compensate
for the race condition mentioned above).

OK. Here's the deal. Systems that don't have waitpid() probably need to use
something like the BROKEN_WAIT code, but slightly different. I'll see about
producing a patch later. I'd still like to see the wait4() docco, though.

Cheers,

Ben.

> 
> 
> > ----- Forwarded message from Geoffrey T. Falk -----
> > 
> > Message-Id: <199612120007.TAA01218@theorem.math.rochester.edu>
> > Content-Type: text/plain
> > MIME-Version: 1.0 (NeXT Mail 3.3 v118.2)
> > From: "Geoffrey T. Falk" <gtf@cirp.org>
> > Date: Wed, 11 Dec 96 19:07:57 -0500
> > To: Rob Hartill <robh@imdb.com>
> > Subject: Re: WWW Form Bug Report: "server hangs frequently" on NeXT
> > References: <199612110258.CAA16340>
> > 
> > Rob,
> > 
> > This relates to my earlier bug report. I have made some more progress in  
> > analyzing the problem.
> > 
> > The child processes seem to be dying. When they are all dead, the server  
> > cannot respond to requests.
> > 
> > According to GDB, the parent process gets stuck in a loop, in the function  
> > wait_or_timeout(). It is waiting for a valid return value from waitpid()  
> > (file http_main.c line 973). However this keeps returning -1, even if there  
> > are connection requests. I looked at the code for waitpid() in util.c. It is  
> > returning failure because there are no children left.
> > 
> > I have no idea why the children are dying off. As a workaround, I have  
> > written a program to periodically check to see if the server is responding;  
> > and if not, send it a HUP. (Graceful restarts have their own problems).
> > 
> > I am not compiling with any unusual options except -DNEXT.
> > 
> > Please help me fix this bug. It is a critical problem.
> > 
> > Thanks
> > g.
> > 
> > ----- End of forwarded message from Geoffrey T. Falk -----
> > 
> > -- 
> > Rob Hartill.       Internet Movie Database Ltd.    http://www.imdb.com/  
> 
> 
> 
> 
> ------- End of Blind-Carbon-Copy

-- 
Ben Laurie                Phone: +44 (181) 994 6435  Email: ben@algroup.co.uk
Freelance Consultant and  Fax:   +44 (181) 994 6472
Technical Director        URL: http://www.algroup.co.uk/Apache-SSL
A.L. Digital Ltd,         Apache Group member (http://www.apache.org)
London, England.          Apache-SSL author

Mime
View raw message