httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Tromer <e...@tromer.org>
Subject [users@httpd] Missing TimeOut SIGALRM with mod_php
Date Thu, 17 Jul 2003 20:25:46 GMT
Greetings,

My Apache processes occasionally get stuck on one request and stop doing
anything useful. There's no progress on that request (as judged by
tracing ethernet traffic or system calls), yet the TimeOut directive
does not kick in.

This happens only with processes serving requests using mod_php. When 
serving plain files, the connection is properly aborted by a SIGALRM at 
the times prescribed by the TimeOut directive and all

I've tested this with two configurations:
1. Debian's apache 1.3.26-0woody3 (including Debian's patches)
    + stock PHP 4.3.0
2. Stock Apache 2.0.47 with preform MPM,
    + stock PHP 4.3.2, configured with either "--with-apxs2" or
      "--with-apxs2filter"
both on Debian GNU/Linux "woody" with Debian's kernel 2.4.18-8.

The hang can be reproduced by making an HTTP request and ceasing to
accept new packets in mid-transfer. For example, run
   "wget -O - http://www.haayal.co.il/story_222"
and press Ctrl-S to pause it after a second or two (the full response is 
1.3MB).

Here are the sys calltraces (obtained via strace) of the Apache process 
handling the request.

For Apache 1.3.26-0woody3, everything looks normal until:
---------------------------
21:04:29.511392 writev(5, [{"<a name=\"5121\"></a>", 19}, {"<table 
width=\"610\" border=\"0\" ce"..., 8537}], 2 <unfinished ...>
21:14:47.479291 <... writev resumed> ) = 4650
21:14:47.482244 writev(5, [{"\340 \344\356\354 *\341\345\372\353\372*
\344\371\367\341"..., 3906}], 1) = -1 ECONNRESET (Connection reset by peer)
---------------------------
Note the 10 minutes gap. Indeed, I manually closed the connection on the
client-side after ten minutes.

For Apache 2.0.47, everything's normal until:
---------------------------
21:59:43.515026 writev(9, [{"<a name=\"9542\"></a>", 19}, {"<table
width=\"610\" border=\"0\" ce"..., 2320}, {"<a name=\"5061\"></a>", 19}, 
{"<table width=\"610\" border=\"0\" ce"..., 3429}, {"<a 
name=\"5092\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 
4876}], 6) = 9354
21:59:43.515305 poll( <unfinished ...>
22:00:43.524236 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
22:00:43.524367 gettimeofday({1058468443, 526207}, NULL) = 0
22:00:43.526313 write(6, "[Thu Jul 17 22:00:43 2003] [info] (70007)The 
timeout specified has expired: core_output_filter: writing data to the 
network", 124) = 124
[[[ communication with local SQL server snipped ]]]
22:00:43.754018 brk(0x82fe000)    = 0x82fe000
22:00:43.755570 writev(9, [{"<a name=\"5094\"></a>", 19}, {"<table 
width=\"610\" border=\"0\" ce"..., 3685}, {"<a name=\"5120\"></a>", 19}, 
{"<table width=\"610\" border=\"0\" ce"..., 5699}], 4) = -1 EAGAIN 
(Resource temporarily unavailable
)
22:00:43.755727 poll( <unfinished ...>
22:01:43.766199 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
22:01:43.766306 gettimeofday({1058468503, 766323}, NULL) = 0
22:01:43.766389 write(6, "[Thu Jul 17 22:01:43 2003] [info] (70007)The 
timeout specified has expired: core_output_filter: writing data to the 
network", 124) = 124
[[[ communication with local SQL server snipped ]]]
22:01:43.859266 writev(9, [{"<a name=\"5121\"></a>", 19}, {"<table 
width=\"610\" border=\"0\" ce"..., 8537}], 2) = -1 EAGAIN (Resource 
temporarily unavailable)
22:01:43.859416 poll( <unfinished ...>
22:02:43.868136 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
---------------------------
and so on, until I closed the connection from the client side after 14 
minutes. Note that poll() returns after exactly 60 seconds, which 
happens to be my TimeOut value.

Tracing the TCP connection using Ethereal shows that once the client 
stops accepting packets (e.g., due to Ctrl-S on "wget -O -"), a pair of 
ACK packets is exchanged every exactly 120 seconds. Thus, the above 
behavior is inconsistent with the description of the TimeOut directive, 
which is supposed to limit "The amount of time between ACKs on 
transmissions of TCP packets in responses" to 60 seconds.

The full straces show that in both cases, the parent Apache process 
never sends the SIGALRM it's supposed to send (also, they show that 
mod_php does *not* mask SIGALRM or change its handler). By contrast, for 
an identical test with a plain file rather than a PHP script, TimeOut 
kicks in, a SIGALRM is sent (by the parent Apache process) and the 
connection is closed:

---------------------------
21:17:54.797220 write(5, "MMYDATADUMMYDATADUMMYDATADUMMYDA"..., 8192 
<unfinished ...>
21:18:55.082141 <... write resumed> ) = 2745
21:18:55.082167 --- SIGALRM (Alarm clock) ---
21:18:55.082480 time(NULL)        = 1058465935
21:18:55.082631 write(15, "[Thu Jul 17 21:18:55 2003] [info] [client 
192.117.108.16] send body timed out", 78) = 78
---------------------------
(that's with apache 1.3.26-0woody3).


For whatever reason, such hung connections occur rather frequently and 
sometimes persist for several hours or more. Thus my Apache is bleeding 
useful processes, and needs to be killed and restarted several times a 
day -- rather inconvenient. Help would be greatly appreciated.


Regards,
   Eran



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message