Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 62788 invoked from network); 29 Apr 2010 16:07:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Apr 2010 16:07:43 -0000 Received: (qmail 32443 invoked by uid 500); 29 Apr 2010 16:07:42 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 32324 invoked by uid 500); 29 Apr 2010 16:07:41 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 32316 invoked by uid 99); 29 Apr 2010 16:07:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Apr 2010 16:07:41 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ames.greg@gmail.com designates 74.125.82.45 as permitted sender) Received: from [74.125.82.45] (HELO mail-ww0-f45.google.com) (74.125.82.45) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Apr 2010 16:07:34 +0000 Received: by wwb17 with SMTP id 17so65wwb.18 for ; Thu, 29 Apr 2010 09:07:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=0UCm0HQMxghOgJ0vq6//kfYCTr3jFqxSrmOosEDiXf4=; b=ayv5TSW9ibOC2/7TLEO2QXyu7EwhFs1L4nGTdxj7llLBhkbHAxDNRSEG8DbEClKOIO NxanzR5X+BJ7SZnXppYnTeoL9N72KLXBugc1K17KkEt8nLL4hI9EhAqfqvbrgKIyzrG2 cAC+gxl19xcOhxke9srxy3uWcLNxyGMr8vTqM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=lfz++08N7UIJA6GZpB7fQXdQIP4WjM9eSnibhpRuk6bXPtA1XtfXGMg31tiMVWkYcq 7S979fMsClgfNMfu9mJ0fOXumqjP6Vzb0pSM5gzxNmL9zyKEO8FSwtBjpSwSpZ5/wZhQ pe/a1UzCTFzfEvzqZaNH7Ut7xd91psELNflow= MIME-Version: 1.0 Received: by 10.216.88.20 with SMTP id z20mr8087344wee.222.1272557197576; Thu, 29 Apr 2010 09:06:37 -0700 (PDT) Received: by 10.216.29.69 with HTTP; Thu, 29 Apr 2010 09:06:37 -0700 (PDT) In-Reply-To: <4BD484E6.2080301@kippdata.de> References: <4BA8A3B7.6060405@kippdata.de> <4BA8CA52.60606@kippdata.de> <4BD484E6.2080301@kippdata.de> Date: Thu, 29 Apr 2010 12:06:37 -0400 Message-ID: Subject: Re: Unclean process shutdown in event MPM? From: Greg Ames To: dev@httpd.apache.org Content-Type: multipart/alternative; boundary=0016e6d778c0b7ca640485624faf X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d778c0b7ca640485624faf Content-Type: text/plain; charset=ISO-8859-1 In 2.2, it is expected behavior. The RFC allows the server to close keepalive connections when it wants. The last time I checked, trunk had a related bug: https://issues.apache.org/bugzilla/show_bug.cgi?id=43359 . Connections waiting for network writes can also be handled as poll events. But Event's process management wasn't updated to take into account that connections might be blocked on network I/O with no current worker thread. So those connections waiting for network writes can also be dropped when the parent thinks there are too many processes around. I did a quick scan of the attached patch a while back but didn't commit it because I thought it should be changed to keep the number of Event - handled connections (i.e., connections with no worker thread) and what kind of event they are waiting on in the scoreboard to facilitate a mod_status display enhancement. But no Round TUITs for years. I will look at the patch again and forget mod_status bells and whistles for now. On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung wrote: > On 23.03.2010 15:30, Jeff Trawick wrote: > >> On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung >> wrote: >> >>> On 23.03.2010 13:34, Jeff Trawick wrote: >>> >>>> >>>> On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung >>>> wrote: >>>> >>>>> >>>>> I can currently reproduce the following problem with 2.2.15 event MPM >>>>> under >>>>> high load: >>>>> >>>>> When an httpd child process gets closed due to the max spare threads >>>>> rule >>>>> and it holds established client connections for which it has fully >>>>> received >>>>> a keep alive request, but not yet send any part of the response, it >>>>> will >>>>> simply close that connection. >>>>> >>>>> Is that expected behaviour? It doesn't seem reproducible for the worker >>>>> MPM. >>>>> The behaviour has been observed using extreme spare rules in order to >>>>> make >>>>> processes shut down often, but it still seems not right. >>>>> >>>> >>>> Is this the currently-unhandled situation discussed in this thread? >>>> >>>> >>>> >>>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E >>>> >>>> Perhaps Event's special handling for keepalive connections results in >>>> the window being encountered more often? >>>> >>> >>> I'd say yes. I know from the packet trace, that the previous response on >>> the >>> same connection got "Connection: Keep-Alive". But from the time gap of >>> about >>> 0.5 seconds between receving the next request and sending the FIN, I >>> guess, >>> that the child was not already in the process of shutting down, when the >>> previous "Connection: Keep-Alive" response was send. >>> >>> So for me the question is: if the web server already acknowledged the >>> next >>> request (in our case it's a GET request, and a TCP ACK), should it wait >>> with >>> shutting down the child until the request has been processed and the >>> response has been send (and in this case "Connetion: Close" was >>> included)? >>> >> >> Since the ACK is out of our control, that situation is potentially >> within the race condition. >> >> >>> For the connections which do not have another request pending, I see no >>> problem in closing them - although there could be a race condition. When >>> there's a race (client sends next request while server sends FIN), the >>> client doesn't expect the server to handle the request (it can always >>> happen >>> when a Keep Alive connection times out). In the situation observed it is >>> annoying, that the server already accepted the next request and >>> nevertheless >>> closes the connection without handling the request. >>> >> >> All we can know is whether or not the socket is readable at the point >> where we want to gracefully exit the process. In keepalive state we'd >> wait for {timeout, readability, shutdown-event}, and if readable at >> wakeup then try to process it unless >> !c->base_server->keep_alive_while_exiting&& >> ap_graceful_stop_signalled(). >> >> I will do some testing around your patch >>> >>> http://people.apache.org/~trawick/keepalive.txt >>> >> >> I don't think the patch will cover Event. It modifies >> ap_process_http_connection(); ap_process_http_async_connection() is >> used with Event unless there are "clogging input filters." I guess >> the analogous point of processing is inside Event itself. >> >> I guess if KeepAliveWhileExiting is enabled (whoops, that's >> vhost-specific) then Event would have substantially different shutdown >> logic. >> > > I could now take a second look at it. Directly porting your patch to trunk > and event is straightforward. There remains a hard problem though: the > listener thread has a big loop of type > > while (!listener_may_exit) { > apr_pollset_poll(...) > while (HANDLE_EVENTS) { > if (READABLE_SOCKET) > ... > else if (ACCEPT) > ... > } > HANDLE_KEEPALIVE_TIMEOUTS > HANDLE_WRITE_COMPLETION_TIMEOUTS > } > > Obviously, if we want to respect any previously retunred "Connection: > Keep-Alive" headers, we can't terminate the loop on listeners_may_exit. As a > first try, I switched to: > > while (1) { > if (listener_may_exit) > ap_close_listeners(); > apr_pollset_poll(...); > REMOVE_LISTENERS_FROM_POLLSET > while (HANDLE_EVENTS) { > if (READABLE_SOCKET) > ... > else if (ACCEPT) > ... > } > HANDLE_KEEPALIVE_TIMEOUTS > HANDLE_WRITE_COMPLETION_TIMEOUTS > } > > Now the listeners get closed and in combination with your patch the > connections will not be dropped, but instead will receive a "Connection: > close" during the next request. > > Now the while-loop lacks a correct break criterium. It would need to stop, > when the pollset is empty (listeners were removed, other connections were > closed due to end of Keep-Alive or timeout). Unfortunately there is no API > function for checking whether there are still sockets in the pollset and it > isn't straightforward how to do that. > > Another possibility would be to wait for a maximum of the vhost keepalive > timeouts. But that seems to be a bit to much. > > Any ideas or comments? > > Regards, > > Rainer > --0016e6d778c0b7ca640485624faf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
In 2.2, it is expected behavior.=A0 The RFC allows the server to close = keepalive connections when it wants.=A0

The last time I checked, tr= unk had a related bug: https://issues.apache.org/bugzilla/show_bug.cgi?id=3D433= 59 . Connections waiting for network writes can also be handled as poll= events.=A0 But Event's process management wasn't updated to take i= nto account that connections might be blocked on network I/O with no curren= t worker thread.=A0 So those connections waiting for network writes can als= o be dropped when the parent thinks there are too many processes around.
I did a quick scan of the attached patch a while back but didn't co= mmit it because I thought it should be changed to keep the number of Event = - handled connections (i.e., connections with no worker thread) and what ki= nd of event they are waiting on in the scoreboard to facilitate a mod_statu= s display enhancement.=A0 But no Round TUITs for years.=A0 I will look at t= he patch again and forget mod_status bells and whistles for now.

On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung= <rainer.ju= ng@kippdata.de> wrote:
On 23.03.2010 15:30, Jeff Trawick wrote:<= br>
On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung<rainer.jung@kippdata.de> =A0wrote= :
On 23.03.2010 13:34, Jeff Trawick wrote:

On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung<rainer.jung@kippdata.de>
=A0wrote:

I can currently reproduce the following problem with 2.2.15 event MPM
under
high load:

When an httpd child process gets closed due to the max spare threads rule and it holds established client connections for which it has fully
received
a keep alive request, but not yet send any part of the response, it will simply close that connection.

Is that expected behaviour? It doesn't seem reproducible for the worker=
MPM.
The behaviour has been observed using extreme spare rules in order to
make
processes shut down often, but it still seems not right.

Is this the currently-unhandled situation discussed in this thread?


http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3C= cc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E

Perhaps Event's special handling for keepalive connections results in the window being encountered more often?

I'd say yes. I know from the packet trace, that the previous response o= n the
same connection got "Connection: Keep-Alive". But from the time g= ap of about
0.5 seconds between receving the next request and sending the FIN, I guess,=
that the child was not already in the process of shutting down, when the previous "Connection: Keep-Alive" response was send.

So for me the question is: if the web server already acknowledged the next<= br> request (in our case it's a GET request, and a TCP ACK), should it wait= with
shutting down the child until the request has been processed and the
response has been send (and in this case "Connetion: Close" was i= ncluded)?

Since the ACK is out of our control, that situation is potentially
within the race condition.


For the connections which do not have another request pending, I see no
problem in closing them - although there could be a race condition. When there's a race (client sends next request while server sends FIN), the<= br> client doesn't expect the server to handle the request (it can always h= appen
when a Keep Alive connection times out). In the situation observed it is annoying, that the server already accepted the next request and nevertheles= s
closes the connection without handling the request.

All we can know is whether or not the socket is readable at the point
where we want to gracefully exit the process. =A0In keepalive state we'= d
wait for {timeout, readability, shutdown-event}, and if readable at
wakeup then try to process it unless
!c->base_server->keep_alive_while_exiting&&
ap_graceful_stop_signalled().

I will do some testing around your patch

http://people.apache.org/~trawick/keepalive.txt

I don't think the patch will cover Event. =A0It modifies
ap_process_http_connection(); ap_process_http_async_connection() is
used with Event unless there are "clogging input filters." =A0I g= uess
the analogous point of processing is inside Event itself.

I guess if KeepAliveWhileExiting is enabled (whoops, that's
vhost-specific) then Event would have substantially different shutdown
logic.

I could now take a second look at it. Directly porting your patch to trunk = and event is straightforward. There remains a hard problem though: the list= ener thread has a big loop of type

=A0 =A0while (!listener_may_exit) {
=A0 =A0 =A0 =A0apr_pollset_poll(...)
=A0 =A0 =A0 =A0while (HANDLE_EVENTS) {
=A0 =A0 =A0 =A0 =A0 =A0if (READABLE_SOCKET)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...
=A0 =A0 =A0 =A0 =A0 =A0else if (ACCEPT)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...
=A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0HANDLE_KEEPALIVE_TIMEOUTS
=A0 =A0 =A0 =A0HANDLE_WRITE_COMPLETION_TIMEOUTS
=A0 =A0}

Obviously, if we want to respect any previously retunred "Connection: = Keep-Alive" headers, we can't terminate the loop on listeners_may_= exit. As a first try, I switched to:

=A0 =A0while (1) {
=A0 =A0 =A0 =A0if (listener_may_exit)
=A0 =A0 =A0 =A0 =A0 =A0ap_close_listeners();
=A0 =A0 =A0 =A0apr_pollset_poll(...);
=A0 =A0 =A0 =A0REMOVE_LISTENERS_FROM_POLLSET
=A0 =A0 =A0 =A0while (HANDLE_EVENTS) {
=A0 =A0 =A0 =A0 =A0 =A0if (READABLE_SOCKET)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...
=A0 =A0 =A0 =A0 =A0 =A0else if (ACCEPT)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...
=A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0HANDLE_KEEPALIVE_TIMEOUTS
=A0 =A0 =A0 =A0HANDLE_WRITE_COMPLETION_TIMEOUTS
=A0 =A0}

Now the listeners get closed and in combination with your patch the connect= ions will not be dropped, but instead will receive a "Connection: clos= e" during the next request.

Now the while-loop lacks a correct break criterium. It would need to stop, = when the pollset is empty (listeners were removed, other connections were c= losed due to end of Keep-Alive or timeout). Unfortunately there is no API f= unction for checking whether there are still sockets in the pollset and it = isn't straightforward how to do that.

Another possibility would be to wait for a maximum of the vhost keepalive t= imeouts. But that seems to be a bit to much.

Any ideas or comments?

Regards,

Rainer

--0016e6d778c0b7ca640485624faf--