httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 10266] - apache hangs after some hours of running
Date Thu, 15 Aug 2002 15:13:33 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266

apache hangs after some hours of running





------- Additional Comments From dcook@cookware.com  2002-08-15 15:13 -------
Another clue to the problem:

After I split the system into 5 apaches running, we have had no hangs for 2 weeks. Then, 
saturday (as I previously reported) we had a single hang of one of the servers.  No 
problems the rest of the weekend, then this last monday morning all he** broke loose.

Monday Morning saw the system hang, and then go nuts basically. We would kill and 
restart, and it would hang within 15 seconds, 15 minutes or about 1 hour - depending on 
the restart (e.g., throughout the day it would hang, the longest it wouldn't hang was 1 
hour).  Not only would the apache we had suspected of hanging hang, but also the other 
that we split, that had not hung for 2 weeks.

By Monday night (10:30 PM Hawaii time, so pretty late) it was back to not hanging.

So what happened monday?  I posted a URL to slashdot.org column and we got a lot of 
hits because of it. I suspect GREATLY that the number of hits contributed to the hang. 
The INTRESTING thing is that of the two apaches that were hanging, one had the 
domain being slashdotted in it, the other didnt (btw, the server was able to server up the

pages with no problem... it was just a lot of hits but no major load or anything).

So... this leads me to believe that the problem is related to traffic.  It is possible that
it is 
related to the restarting of a child after a maximum number of hits.

I also discovered that my earlier reports were untrue... in this regard:

1)  I reported earlier that HUPing the hung server did no good. This is not true. HUPping

it appears to work. It takes up to a minute to free up - and sometimes requires a second 
HUP before it frees up. ONLY OCCASSIONALLY after two hup attempts would it not free 
up and we had to kill/restart.

2) I reported eariler that my remote alarms that try to sense it would also hang on the 
open and not recover. While this is true, it was due to me using SIGNAL() instead of 
SIGACTION() (signal() automatically sets the restart flag to tell the socket commands to 
retry the operation after the signal... thus it *appeared* to be always stuck).  So... I am

able to sense it (we've since written a program to sense it on the server and automatically

rehup or restart the server depending on what it sees).


So... all of this leads me to believe it's the rollover (restart) of the children.  Note that
if one 
hangs, all other virtuals on the same server also hang (e.g., no virtual assigned to the 
stuck server will respond until we rehup it).

The only other possibility, I think, is some type of exploit that hangs apache in this way...

but I think that is remote.

One last thing...  when we were having the problems on Monday, I tried to roll back 
apache to version httpd-2.0.36  -- but the same problem occured, so I brought the version

back to httpd-2.0.40.  (So this problem appears to be in all versions SINCE and 
INCLUDING 2.0.36).

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


Mime
View raw message