www-apache-bugdb mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Braithwaite <...@alink.net>
Subject general/1787: accept loops on ENOTSOCK, filling up logfile
Date Tue, 10 Feb 1998 01:00:37 GMT

>Number:         1787
>Category:       general
>Synopsis:       accept loops on ENOTSOCK, filling up logfile
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Mon Feb  9 17:10:00 PST 1998
>Originator:     mab@alink.net
>Release:        1.2.1
BSDI 3.1, gcc2, apache 1.2.1
BSD/OS fg.alink.net 3.1 BSDI BSD/OS 3.1 Kernel #3: Mon Dec  8 14:32:38 PST 1997     mab@fg.alink.net:/usr/src/sys/compile/ALINK
we do graceful restarts hourly.  on rare occasions, right after a graceful
restart, the logfile will fill up with
	accept: (client socket): socket operation on non-socket

from examining the code, it appears that the meaning of this is that a socket
becomes invalid in one or more children's lists, and consistently is reported
as being read-ready by select, and is consistently bonged by accept.  i have no
idea how the child wound up with a bogus socket descriptor in the first place,
though.  i am curious to know what graceful restarts have to do with it.
could anything done by the parent cause the child's socket to become invalid?
this is a plausible question for me because of the race condition with graceful
restarts---it looks, at first glance, as though a child could miss a SIGUSR1. 
not reproducible.  has happened a handful of times.
i have found a number of tickets that sound a lot like this, but none that are
exactly the same.  there are perhaps 3-6 of the form ``accept/select in 
http_main.c loops on some error, filling up the logfile.''  in these tickets
you have been able to suggest specific fixes that go to the root of the problem.
however, i think this loop could be more defensive, given that the case of 
accept or select getting stuck on a particular error seems not entirely
uncommon.  one possibility:  count failed accept()s against the maximum number
of connections permitted for a child process, so that in conditions like this
the logs won't fill up
[In order for any reply to be added to the PR database, ]
[you need to include <apbugs@Apache.Org> in the Cc line ]
[and leave the subject line UNCHANGED.  This is not done]
[automatically because of the potential for mail loops. ]

View raw message