httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Trawick" <traw...@gmail.com>
Subject Re: mod_cgid and accept() loop
Date Sun, 18 Mar 2007 13:05:33 GMT
On 3/17/07, Amol Dev <devamol@yahoo.com> wrote:
> After running the Apache-2.0.58 server on mod_cgid on HPUX B.11.23 PA for 3-4 days all
of sudden I see the following errors in error_log.
>
> "[Fri Mar 16 07:23:53 2007] [error] (231)Software caused connection abort: Error accepting
on cgid socket"
>
> There were 18 millons such entries in 30 minutes which mean the cgid daemon was under
infinite loop.

        len = sizeof(unix_addr);
        sd2 = accept(sd, (struct sockaddr *)&unix_addr, &len);
        if (sd2 < 0) {
            if (errno != EINTR) {
                ap_log_error(APLOG_MARK, APLOG_ERR, errno,
                             (server_rec *)data,
                             "Error accepting on cgid socket");
            }
            continue;
        }

>  Error '231'  is ECONNABORTED, which is not handled by mod_cgid and puts the
>accept() into infinite loop.

no, ECONNABORTED will generate a log message and go back into accept
and wait for a new connection; it takes an infinite number of such
connections (or kernel acting like there is) to create an infinite
loop there

perhaps the kernel is confused?  some unknown glitch caused a
connection to be aborted once, and kernel has left it on an internal
queue even after accept() is called?

> Not sure why would this socket be shutdown() by anything. But if it does get
>ECONNABORTED how should mod_cgid handle it?

It handles it correctly today IMHO.

Without information on root cause of the kernel acting like there is
an endless number of aborted connections to the mod_cgid socket, I
wouldn't suggest any change to Apache.

>  Should we handle this error by setting daemon_should_exit++? Does that respawn
>new daemon without interruption?

You may wish to make a local modification to have the cgid process
exit if, for example, 10 consecutive calls to accept() return
-1/ECONNABORTED.

You may first want to try to catch it happening again and use tusc to
see if child process(es) handling request are repeatedly trying to
connect to mod_cgid's socket.  If they're not doing anything wrong,
see about applicable kernel patches.

If by chance you're using HP's Apache-based server and have support
for it, give them a call.  If anybody has heard of this before they
would likely be in the know.

Mime
View raw message