httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Orton <jor...@redhat.com>
Subject Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes
Date Fri, 18 Jan 2008 23:05:30 GMT
On Fri, Jan 04, 2008 at 02:42:05PM +0100, Stefan Fritsch wrote:
> this bug can be quite annoying because of the resources used by the hung
> processes. It happens e.g. under Linux when epoll is used.
> 
> The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
> has been in Debian unstable/Ubuntu hardy for several weeks and there have
> not been any complaints.

I've been looking into this in more detail; excuse the length of this 
mail.  The symptom in question is described as "children hang after 
graceful restart/stop in 2.2.x".

I mentioned in the bug that the signal handler could cause undefined 
behaviour, but I'm not sure now whether that is true.  On Linux I can 
reproduce some cases where this will happen, which are all due to 
well-defined behaviour:

1) with some (default on Linux) accept mutex types, 
apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked 
waiting for the mutex do "hang" until the mutex is released.  Fixing 
this would need some APR work, new interfaces, blah

2) prefork's apr_pollset_poll() loop-on-EINTR loop was not checking 
die_now; the child holding the mutex will not die immediately if poll 
fails with EINTR, and will hence appear to "hang" until a new connection 
is recevied.  Fixed by http://svn.apache.org/viewvc?rev=613260&view=rev

I can also reproduce a third case, but I'm not sure about the cause:

3) apr_pollset_poll() is blocking despite the fact that the listening 
fds are supposedly already closed before entering the syscall.

I vaguely recall some issue with epoll being mentioned before in the 
context of graceful stop, but I can't find a reference.  Colm?

A very tempting explanation for (3) would be the fact that prefork only 
polls for POLLIN events, not POLLHUP or POLLERR, or indeed that it does 
not check that the returned event really is a POLLIN event; POSIX says 
on poll:

" ... poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in
 revents if the condition is true, even if the application did not set
 the corresponding bit in events."

and there's even a comment in the prefork poll code to the effect that 
maybe checking the returned event type would be a good idea.  But from a 
brief play around here, fixing the poll code to DTRT doesn't help.  I 
think more investigation is needed to understand exactly what is going 
on here.

(Also, just to note; I can reproduce (3) even with my patch to dup2 
against the listener fds.)

joe

Mime
View raw message