Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Received-SPF: pass (nike.apache.org: domain of ames.greg@gmail.com designates
 74.125.82.45 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=lfz++08N7UIJA6GZpB7fQXdQIP4WjM9eSnibhpRuk6bXPtA1XtfXGMg31tiMVWkYcq
         7S979fMsClgfNMfu9mJ0fOXumqjP6Vzb0pSM5gzxNmL9zyKEO8FSwtBjpSwSpZ5/wZhQ
         pe/a1UzCTFzfEvzqZaNH7Ut7xd91psELNflow=
MIME-Version: 1.0
In-Reply-To: <4BD484E6.2080301@kippdata.de>
References: <4BA8A3B7.6060405@kippdata.de>
	 <cc67648e1003230534t7457f1dbl2d380a03a63d0b30@mail.gmail.com>
	 <4BA8CA52.60606@kippdata.de>
	 <cc67648e1003230730l705d245fx662c11d7f5921e13@mail.gmail.com>
	 <4BD484E6.2080301@kippdata.de>
Date: Thu, 29 Apr 2010 12:06:37 -0400
Message-ID: <p2tad8a87121004290906j655f8bc7ha6abeda39488d61b@mail.gmail.com>
Subject: Re: Unclean process shutdown in event MPM?
From: Greg Ames <ames.greg@gmail.com>
To: dev@httpd.apache.org
Content-Type: multipart/alternative; boundary=0016e6d778c0b7ca640485624faf

--0016e6d778c0b7ca640485624faf
Content-Type: text/plain; charset=ISO-8859-1

In 2.2, it is expected behavior.  The RFC allows the server to close
keepalive connections when it wants.

The last time I checked, trunk had a related bug:
https://issues.apache.org/bugzilla/show_bug.cgi?id=43359 . Connections
waiting for network writes can also be handled as poll events.  But Event's
process management wasn't updated to take into account that connections
might be blocked on network I/O with no current worker thread.  So those
connections waiting for network writes can also be dropped when the parent
thinks there are too many processes around.

I did a quick scan of the attached patch a while back but didn't commit it
because I thought it should be changed to keep the number of Event - handled
connections (i.e., connections with no worker thread) and what kind of event
they are waiting on in the scoreboard to facilitate a mod_status display
enhancement.  But no Round TUITs for years.  I will look at the patch again
and forget mod_status bells and whistles for now.

On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung <rainer.jung@kippdata.de>wrote:

> On 23.03.2010 15:30, Jeff Trawick wrote:
>
>> On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung<rainer.jung@kippdata.de>
>>  wrote:
>>
>>> On 23.03.2010 13:34, Jeff Trawick wrote:
>>>
>>>>
>>>> On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung<rainer.jung@kippdata.de>
>>>>  wrote:
>>>>
>>>>>
>>>>> I can currently reproduce the following problem with 2.2.15 event MPM
>>>>> under
>>>>> high load:
>>>>>
>>>>> When an httpd child process gets closed due to the max spare threads
>>>>> rule
>>>>> and it holds established client connections for which it has fully
>>>>> received
>>>>> a keep alive request, but not yet send any part of the response, it
>>>>> will
>>>>> simply close that connection.
>>>>>
>>>>> Is that expected behaviour? It doesn't seem reproducible for the worker
>>>>> MPM.
>>>>> The behaviour has been observed using extreme spare rules in order to
>>>>> make
>>>>> processes shut down often, but it still seems not right.
>>>>>
>>>>
>>>> Is this the currently-unhandled situation discussed in this thread?
>>>>
>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E
>>>>
>>>> Perhaps Event's special handling for keepalive connections results in
>>>> the window being encountered more often?
>>>>
>>>
>>> I'd say yes. I know from the packet trace, that the previous response on
>>> the
>>> same connection got "Connection: Keep-Alive". But from the time gap of
>>> about
>>> 0.5 seconds between receving the next request and sending the FIN, I
>>> guess,
>>> that the child was not already in the process of shutting down, when the
>>> previous "Connection: Keep-Alive" response was send.
>>>
>>> So for me the question is: if the web server already acknowledged the
>>> next
>>> request (in our case it's a GET request, and a TCP ACK), should it wait
>>> with
>>> shutting down the child until the request has been processed and the
>>> response has been send (and in this case "Connetion: Close" was
>>> included)?
>>>
>>
>> Since the ACK is out of our control, that situation is potentially
>> within the race condition.
>>
>>
>>> For the connections which do not have another request pending, I see no
>>> problem in closing them - although there could be a race condition. When
>>> there's a race (client sends next request while server sends FIN), the
>>> client doesn't expect the server to handle the request (it can always
>>> happen
>>> when a Keep Alive connection times out). In the situation observed it is
>>> annoying, that the server already accepted the next request and
>>> nevertheless
>>> closes the connection without handling the request.
>>>
>>
>> All we can know is whether or not the socket is readable at the point
>> where we want to gracefully exit the process.  In keepalive state we'd
>> wait for {timeout, readability, shutdown-event}, and if readable at
>> wakeup then try to process it unless
>> !c->base_server->keep_alive_while_exiting&&
>> ap_graceful_stop_signalled().
>>
>>  I will do some testing around your patch
>>>
>>> http://people.apache.org/~trawick/keepalive.txt<http://people.apache.org/%7Etrawick/keepalive.txt>
>>>
>>
>> I don't think the patch will cover Event.  It modifies
>> ap_process_http_connection(); ap_process_http_async_connection() is
>> used with Event unless there are "clogging input filters."  I guess
>> the analogous point of processing is inside Event itself.
>>
>> I guess if KeepAliveWhileExiting is enabled (whoops, that's
>> vhost-specific) then Event would have substantially different shutdown
>> logic.
>>
>
> I could now take a second look at it. Directly porting your patch to trunk
> and event is straightforward. There remains a hard problem though: the
> listener thread has a big loop of type
>
>    while (!listener_may_exit) {
>        apr_pollset_poll(...)
>        while (HANDLE_EVENTS) {
>            if (READABLE_SOCKET)
>                ...
>            else if (ACCEPT)
>                ...
>        }
>        HANDLE_KEEPALIVE_TIMEOUTS
>        HANDLE_WRITE_COMPLETION_TIMEOUTS
>    }
>
> Obviously, if we want to respect any previously retunred "Connection:
> Keep-Alive" headers, we can't terminate the loop on listeners_may_exit. As a
> first try, I switched to:
>
>    while (1) {
>        if (listener_may_exit)
>            ap_close_listeners();
>        apr_pollset_poll(...);
>        REMOVE_LISTENERS_FROM_POLLSET
>        while (HANDLE_EVENTS) {
>            if (READABLE_SOCKET)
>                ...
>            else if (ACCEPT)
>                ...
>        }
>        HANDLE_KEEPALIVE_TIMEOUTS
>        HANDLE_WRITE_COMPLETION_TIMEOUTS
>    }
>
> Now the listeners get closed and in combination with your patch the
> connections will not be dropped, but instead will receive a "Connection:
> close" during the next request.
>
> Now the while-loop lacks a correct break criterium. It would need to stop,
> when the pollset is empty (listeners were removed, other connections were
> closed due to end of Keep-Alive or timeout). Unfortunately there is no API
> function for checking whether there are still sockets in the pollset and it
> isn't straightforward how to do that.
>
> Another possibility would be to wait for a maximum of the vhost keepalive
> timeouts. But that seems to be a bit to much.
>
> Any ideas or comments?
>
> Regards,
>
> Rainer
>

--0016e6d778c0b7ca640485624faf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br>In 2.2, it is expected behavior.=A0 The RFC allows the server to close =
keepalive connections when it wants.=A0 <br><br>The last time I checked, tr=
unk had a related bug: <a href=3D"https://issues.apache.org/bugzilla/show_b=
ug.cgi?id=3D43359">https://issues.apache.org/bugzilla/show_bug.cgi?id=3D433=
59</a> . Connections waiting for network writes can also be handled as poll=
 events.=A0 But Event&#39;s process management wasn&#39;t updated to take i=
nto account that connections might be blocked on network I/O with no curren=
t worker thread.=A0 So those connections waiting for network writes can als=
o be dropped when the parent thinks there are too many processes around.<br=
>
<br>I did a quick scan of the attached patch a while back but didn&#39;t co=
mmit it because I thought it should be changed to keep the number of Event =
- handled connections (i.e., connections with no worker thread) and what ki=
nd of event they are waiting on in the scoreboard to facilitate a mod_statu=
s display enhancement.=A0 But no Round TUITs for years.=A0 I will look at t=
he patch again and forget mod_status bells and whistles for now.<br>
<br><div class=3D"gmail_quote">On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:rainer.jung@kippdata.de">rainer.ju=
ng@kippdata.de</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e=
x; padding-left: 1ex;">
<div><div></div><div class=3D"h5">On 23.03.2010 15:30, Jeff Trawick wrote:<=
br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung&lt;<a href=3D"mailto:rainer.j=
ung@kippdata.de" target=3D"_blank">rainer.jung@kippdata.de</a>&gt; =A0wrote=
:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 23.03.2010 13:34, Jeff Trawick wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung&lt;<a href=3D"mailto:rainer.ju=
ng@kippdata.de" target=3D"_blank">rainer.jung@kippdata.de</a>&gt;<br>
 =A0wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
I can currently reproduce the following problem with 2.2.15 event MPM<br>
under<br>
high load:<br>
<br>
When an httpd child process gets closed due to the max spare threads rule<b=
r>
and it holds established client connections for which it has fully<br>
received<br>
a keep alive request, but not yet send any part of the response, it will<br=
>
simply close that connection.<br>
<br>
Is that expected behaviour? It doesn&#39;t seem reproducible for the worker=
<br>
MPM.<br>
The behaviour has been observed using extreme spare rules in order to<br>
make<br>
processes shut down often, but it still seems not right.<br>
</blockquote>
<br>
Is this the currently-unhandled situation discussed in this thread?<br>
<br>
<br>
<a href=3D"http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%=
3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E" target=3D=
"_blank">http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3C=
cc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E</a><br>

<br>
Perhaps Event&#39;s special handling for keepalive connections results in<b=
r>
the window being encountered more often?<br>
</blockquote>
<br>
I&#39;d say yes. I know from the packet trace, that the previous response o=
n the<br>
same connection got &quot;Connection: Keep-Alive&quot;. But from the time g=
ap of about<br>
0.5 seconds between receving the next request and sending the FIN, I guess,=
<br>
that the child was not already in the process of shutting down, when the<br=
>
previous &quot;Connection: Keep-Alive&quot; response was send.<br>
<br>
So for me the question is: if the web server already acknowledged the next<=
br>
request (in our case it&#39;s a GET request, and a TCP ACK), should it wait=
 with<br>
shutting down the child until the request has been processed and the<br>
response has been send (and in this case &quot;Connetion: Close&quot; was i=
ncluded)?<br>
</blockquote>
<br>
Since the ACK is out of our control, that situation is potentially<br>
within the race condition.<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
For the connections which do not have another request pending, I see no<br>
problem in closing them - although there could be a race condition. When<br=
>
there&#39;s a race (client sends next request while server sends FIN), the<=
br>
client doesn&#39;t expect the server to handle the request (it can always h=
appen<br>
when a Keep Alive connection times out). In the situation observed it is<br=
>
annoying, that the server already accepted the next request and nevertheles=
s<br>
closes the connection without handling the request.<br>
</blockquote>
<br>
All we can know is whether or not the socket is readable at the point<br>
where we want to gracefully exit the process. =A0In keepalive state we&#39;=
d<br>
wait for {timeout, readability, shutdown-event}, and if readable at<br>
wakeup then try to process it unless<br>
!c-&gt;base_server-&gt;keep_alive_while_exiting&amp;&amp;<br>
ap_graceful_stop_signalled().<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I will do some testing around your patch<br>
<br>
<a href=3D"http://people.apache.org/%7Etrawick/keepalive.txt" target=3D"_bl=
ank">http://people.apache.org/~trawick/keepalive.txt</a><br>
</blockquote>
<br>
I don&#39;t think the patch will cover Event. =A0It modifies<br>
ap_process_http_connection(); ap_process_http_async_connection() is<br>
used with Event unless there are &quot;clogging input filters.&quot; =A0I g=
uess<br>
the analogous point of processing is inside Event itself.<br>
<br>
I guess if KeepAliveWhileExiting is enabled (whoops, that&#39;s<br>
vhost-specific) then Event would have substantially different shutdown<br>
logic.<br>
</blockquote>
<br></div></div>
I could now take a second look at it. Directly porting your patch to trunk =
and event is straightforward. There remains a hard problem though: the list=
ener thread has a big loop of type<br>
<br>
 =A0 =A0while (!listener_may_exit) {<br>
 =A0 =A0 =A0 =A0apr_pollset_poll(...)<br>
 =A0 =A0 =A0 =A0while (HANDLE_EVENTS) {<br>
 =A0 =A0 =A0 =A0 =A0 =A0if (READABLE_SOCKET)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...<br>
 =A0 =A0 =A0 =A0 =A0 =A0else if (ACCEPT)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...<br>
 =A0 =A0 =A0 =A0}<br>
 =A0 =A0 =A0 =A0HANDLE_KEEPALIVE_TIMEOUTS<br>
 =A0 =A0 =A0 =A0HANDLE_WRITE_COMPLETION_TIMEOUTS<br>
 =A0 =A0}<br>
<br>
Obviously, if we want to respect any previously retunred &quot;Connection: =
Keep-Alive&quot; headers, we can&#39;t terminate the loop on listeners_may_=
exit. As a first try, I switched to:<br>
<br>
 =A0 =A0while (1) {<br>
 =A0 =A0 =A0 =A0if (listener_may_exit)<br>
 =A0 =A0 =A0 =A0 =A0 =A0ap_close_listeners();<br>
 =A0 =A0 =A0 =A0apr_pollset_poll(...);<br>
 =A0 =A0 =A0 =A0REMOVE_LISTENERS_FROM_POLLSET<br>
 =A0 =A0 =A0 =A0while (HANDLE_EVENTS) {<br>
 =A0 =A0 =A0 =A0 =A0 =A0if (READABLE_SOCKET)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...<br>
 =A0 =A0 =A0 =A0 =A0 =A0else if (ACCEPT)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...<br>
 =A0 =A0 =A0 =A0}<br>
 =A0 =A0 =A0 =A0HANDLE_KEEPALIVE_TIMEOUTS<br>
 =A0 =A0 =A0 =A0HANDLE_WRITE_COMPLETION_TIMEOUTS<br>
 =A0 =A0}<br>
<br>
Now the listeners get closed and in combination with your patch the connect=
ions will not be dropped, but instead will receive a &quot;Connection: clos=
e&quot; during the next request.<br>
<br>
Now the while-loop lacks a correct break criterium. It would need to stop, =
when the pollset is empty (listeners were removed, other connections were c=
losed due to end of Keep-Alive or timeout). Unfortunately there is no API f=
unction for checking whether there are still sockets in the pollset and it =
isn&#39;t straightforward how to do that.<br>

<br>
Another possibility would be to wait for a maximum of the vhost keepalive t=
imeouts. But that seems to be a bit to much.<br>
<br>
Any ideas or comments?<br>
<br>
Regards,<br><font color=3D"#888888">
<br>
Rainer<br>
</font></blockquote></div><br>

--0016e6d778c0b7ca640485624faf--