httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ruler...@optonline.net
Subject Re: [users@httpd] Missing TimeOut SIGALRM with mod_php
Date Thu, 17 Jul 2003 20:28:03 GMT
Hello Eran,

Thursday, July 17, 2003, 4:25:46 PM, you wrote:

ET> Greetings,

ET> My Apache processes occasionally get stuck on one request and stop doing
ET> anything useful. There's no progress on that request (as judged by
ET> tracing ethernet traffic or system calls), yet the TimeOut directive
ET> does not kick in.

ET> This happens only with processes serving requests using mod_php. When 
ET> serving plain files, the connection is properly aborted by a SIGALRM at 
ET> the times prescribed by the TimeOut directive and all

ET> I've tested this with two configurations:
ET> 1. Debian's apache 1.3.26-0woody3 (including Debian's patches)
ET>     + stock PHP 4.3.0
ET> 2. Stock Apache 2.0.47 with preform MPM,
ET>     + stock PHP 4.3.2, configured with either "--with-apxs2" or
ET>       "--with-apxs2filter"
ET> both on Debian GNU/Linux "woody" with Debian's kernel 2.4.18-8.

ET> The hang can be reproduced by making an HTTP request and ceasing to
ET> accept new packets in mid-transfer. For example, run
ET>    "wget -O - http://www.haayal.co.il/story_222"
ET> and press Ctrl-S to pause it after a second or two (the full response is 
ET> 1.3MB).

ET> Here are the sys calltraces (obtained via strace) of the Apache process 
ET> handling the request.

ET> For Apache 1.3.26-0woody3, everything looks normal until:
ET> ---------------------------
ET> 21:04:29.511392 writev(5, [{"<a name=\"5121\"></a>", 19}, {"<table 
ET> width=\"610\" border=\"0\" ce"..., 8537}], 2 <unfinished ...>
ET> 21:14:47.479291 <... writev resumed> ) = 4650
ET> 21:14:47.482244 writev(5, [{"\340 \344\356\354 *\341\345\372\353\372*
ET> \344\371\367\341"..., 3906}], 1) = -1 ECONNRESET (Connection reset by peer)
ET> ---------------------------
ET> Note the 10 minutes gap. Indeed, I manually closed the connection on the
ET> client-side after ten minutes.

ET> For Apache 2.0.47, everything's normal until:
ET> ---------------------------
ET> 21:59:43.515026 writev(9, [{"<a name=\"9542\"></a>", 19}, {"<table
ET> width=\"610\" border=\"0\" ce"..., 2320}, {"<a name=\"5061\"></a>", 19},

ET> {"<table width=\"610\" border=\"0\" ce"..., 3429}, {"<a 
name=\"5092\">></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 
ET> 4876}], 6) = 9354
ET> 21:59:43.515305 poll( <unfinished ...>
ET> 22:00:43.524236 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
ET> 22:00:43.524367 gettimeofday({1058468443, 526207}, NULL) = 0
ET> 22:00:43.526313 write(6, "[Thu Jul 17 22:00:43 2003] [info] (70007)The 
ET> timeout specified has expired: core_output_filter: writing data to the 
ET> network", 124) = 124
ET> [[[ communication with local SQL server snipped ]]]
ET> 22:00:43.754018 brk(0x82fe000)    = 0x82fe000
ET> 22:00:43.755570 writev(9, [{"<a name=\"5094\"></a>", 19}, {"<table 
ET> width=\"610\" border=\"0\" ce"..., 3685}, {"<a name=\"5120\"></a>", 19},

ET> {"<table width=\"610\" border=\"0\" ce"..., 5699}], 4) = -1 EAGAIN 
ET> (Resource temporarily unavailable
ET> )
ET> 22:00:43.755727 poll( <unfinished ...>
ET> 22:01:43.766199 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
ET> 22:01:43.766306 gettimeofday({1058468503, 766323}, NULL) = 0
ET> 22:01:43.766389 write(6, "[Thu Jul 17 22:01:43 2003] [info] (70007)The 
ET> timeout specified has expired: core_output_filter: writing data to the 
ET> network", 124) = 124
ET> [[[ communication with local SQL server snipped ]]]
ET> 22:01:43.859266 writev(9, [{"<a name=\"5121\"></a>", 19}, {"<table 
ET> width=\"610\" border=\"0\" ce"..., 8537}], 2) = -1 EAGAIN (Resource 
ET> temporarily unavailable)
ET> 22:01:43.859416 poll( <unfinished ...>
ET> 22:02:43.868136 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
ET> ---------------------------
ET> and so on, until I closed the connection from the client side after 14 
ET> minutes. Note that poll() returns after exactly 60 seconds, which 
ET> happens to be my TimeOut value.

ET> Tracing the TCP connection using Ethereal shows that once the client 
ET> stops accepting packets (e.g., due to Ctrl-S on "wget -O -"), a pair of 
ET> ACK packets is exchanged every exactly 120 seconds. Thus, the above 
ET> behavior is inconsistent with the description of the TimeOut directive, 
ET> which is supposed to limit "The amount of time between ACKs on 
ET> transmissions of TCP packets in responses" to 60 seconds.

ET> The full straces show that in both cases, the parent Apache process 
ET> never sends the SIGALRM it's supposed to send (also, they show that 
ET> mod_php does *not* mask SIGALRM or change its handler). By contrast, for 
ET> an identical test with a plain file rather than a PHP script, TimeOut 
ET> kicks in, a SIGALRM is sent (by the parent Apache process) and the 
ET> connection is closed:

ET> ---------------------------
ET> 21:17:54.797220 write(5, "MMYDATADUMMYDATADUMMYDATADUMMYDA"..., 8192 
ET> <unfinished ...>
ET> 21:18:55.082141 <... write resumed> ) = 2745
ET> 21:18:55.082167 --- SIGALRM (Alarm clock) ---
ET> 21:18:55.082480 time(NULL)        = 1058465935
ET> 21:18:55.082631 write(15, "[Thu Jul 17 21:18:55 2003] [info] [client 
ET> 192.117.108.16] send body timed out", 78) = 78
ET> ---------------------------
ET> (that's with apache 1.3.26-0woody3).


ET> For whatever reason, such hung connections occur rather frequently and 
ET> sometimes persist for several hours or more. Thus my Apache is bleeding 
ET> useful processes, and needs to be killed and restarted several times a 
ET> day -- rather inconvenient. Help would be greatly appreciated.


ET> Regards,
ET>    Eran



ET> ---------------------------------------------------------------------
ET> The official User-To-User support forum of the Apache HTTP Server Project.
ET> See <URL:http://httpd.apache.org/userslist.html> for more info.
ET> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
ET>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
ET> For additional commands, e-mail: users-help@httpd.apache.org

Thank you for posting such an in-depth analysis of the problem.  After
a quick overview, this is exactly what happens to my machine.
Handling millions of hits per day, apache needs to be restarted on all
the servers 3-4 times daily (I have a cron set to do it now because
it's painful to my server after a few hours)

I actually posted to this list to see if anyone knew why, but people
thought I was a bit crazy in my assumptions.

I hope you get a useful response from this,

Thanks


-- 
Best regards,
 rulerpen                            mailto:rulerpen@optonline.net


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message