httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Dev <deva...@yahoo.com>
Subject Re: mod_cgid and accept() loop
Date Sun, 18 Mar 2007 16:53:08 GMT
I did not notice any unusual activity in access log or any problem in syslog during or before
the time these error were logged. It could well be the kernel issue. I will run this problem
with HP Apache support team. 

The problem did not happened for long time and not sure what is tiggering it. We might end
up having a local modification in mod_cgid.c to check for ECONNABORTED before I could put
mod_cgid module back in. Just have to make sure the daemon will be relaunched taking on requests
without problem if that happens.

Thanks,
Amol

----- Original Message ----
From: Jeff Trawick <trawick@gmail.com>
To: dev@httpd.apache.org
Sent: Sunday, March 18, 2007 6:05:33 AM
Subject: Re: mod_cgid and accept() loop


On 3/17/07, Amol Dev <devamol@yahoo.com> wrote:
> After running the Apache-2.0.58 server on mod_cgid on HPUX B.11.23 PA for 3-4 days all
of sudden I see the following errors in error_log.
>
> "[Fri Mar 16 07:23:53 2007] [error] (231)Software caused connection abort: Error accepting
on cgid socket"
>
> There were 18 millons such entries in 30 minutes which mean the cgid daemon was under
infinite loop.

        len = sizeof(unix_addr);
        sd2 = accept(sd, (struct sockaddr *)&unix_addr, &len);
        if (sd2 < 0) {
            if (errno != EINTR) {
                ap_log_error(APLOG_MARK, APLOG_ERR, errno,
                             (server_rec *)data,
                             "Error accepting on cgid socket");
            }
            continue;
        }

>  Error '231'  is ECONNABORTED, which is not handled by mod_cgid and puts the
>accept() into infinite loop.

no, ECONNABORTED will generate a log message and go back into accept
and wait for a new connection; it takes an infinite number of such
connections (or kernel acting like there is) to create an infinite
loop there

perhaps the kernel is confused?  some unknown glitch caused a
connection to be aborted once, and kernel has left it on an internal
queue even after accept() is called?

> Not sure why would this socket be shutdown() by anything. But if it does get
>ECONNABORTED how should mod_cgid handle it?

It handles it correctly today IMHO.

Without information on root cause of the kernel acting like there is
an endless number of aborted connections to the mod_cgid socket, I
wouldn't suggest any change to Apache.

>  Should we handle this error by setting daemon_should_exit++? Does that respawn
>new daemon without interruption?

You may wish to make a local modification to have the cgid process
exit if, for example, 10 consecutive calls to accept() return
-1/ECONNABORTED.

You may first want to try to catch it happening again and use tusc to
see if child process(es) handling request are repeatedly trying to
connect to mod_cgid's socket.  If they're not doing anything wrong,
see about applicable kernel patches.

If by chance you're using HP's Apache-based server and have support
for it, give them a call.  If anybody has heard of this before they
would likely be in the know.


 
____________________________________________________________________________________
We won't tell. Get more on shows you hate to love 
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265 

Mime
View raw message