httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruediger Pluem <rpl...@apache.org>
Subject Re: error_time reset in proxy_util.c
Date Thu, 05 Mar 2015 17:00:02 GMT


On 03/05/2015 05:29 PM, Eric Covener wrote:
> On Thu, Mar 5, 2015 at 11:14 AM, Ruediger Pluem <rpluem@apache.org> wrote:
>> I suspect that the worker was already set to error by a parallel thread / process
and hence
>> PROXY_WORKER_IS_USABLE(worker) is false and causes worker->s->error_time to
be reset which causes the worker to be open
>> for retry immediately. This has been the case since r104624
>> (http://svn.apache.org/viewvc?view=revision&revision=r104624) 10,5 years ago
and the commit messages offers no hint at
>> least to be why we reset these values.
>> Can anybody think of a good reason why we do this?
> 
> that sequence sounds plausible

So do you see any reason why we do this, or should we just stop doing so?

Another thing:

    else {
        if (worker->s->retries) {
            /*
             * A worker came back. So here is where we need to
             * either reset all params to initial conditions or
             * apply some sort of aging
             */
        }

The comment makes a wrong assumption as we might be here despite the connection failed because
the worker was already in
error. So we might need to split into two ifs above and separate the conditions where the
connection failed and where we
need to do something with the worker because it wasn't in error yet. How about:

Index: proxy_util.c
===================================================================
--- proxy_util.c        (revision 1664261)
+++ proxy_util.c        (working copy)
@@ -2887,33 +2887,48 @@

         connected    = 1;
     }
-    /*
-     * Put the entire worker to error state if
-     * the PROXY_WORKER_IGNORE_ERRORS flag is not set.
-     * Altrough some connections may be alive
-     * no further connections to the worker could be made
-     */
-    if (!connected && PROXY_WORKER_IS_USABLE(worker) &&
-        !(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
-        worker->s->error_time = apr_time_now();
-        worker->s->status |= PROXY_WORKER_IN_ERROR;
-        ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(00959)
-            "ap_proxy_connect_backend disabling worker for (%s) for %"
-            APR_TIME_T_FMT "s",
-            worker->s->hostname, apr_time_sec(worker->s->retry));
+    if (PROXY_WORKER_IS_USABLE(worker)) {
+        /*
+         * Put the entire worker to error state if
+         * the PROXY_WORKER_IGNORE_ERRORS flag is not set.
+         * Altrough some connections may be alive
+         * no further connections to the worker could be made
+         */
+        if (!connected) {
+            if (!(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
+                worker->s->error_time = apr_time_now();
+                worker->s->status |= PROXY_WORKER_IN_ERROR;
+                ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(00959)
+                    "ap_proxy_connect_backend disabling worker for (%s) for %"
+                    APR_TIME_T_FMT "s",
+                    worker->s->hostname, apr_time_sec(worker->s->retry));
+            }
+        }
+        else {
+            if (worker->s->retries) {
+                /*
+                 * A worker came back. So here is where we need to
+                 * either reset all params to initial conditions or
+                 * apply some sort of aging
+                 */
+            }
+            worker->s->error_time = 0;
+            worker->s->retries = 0;
+        }
+        return connected ? OK : DECLINED;
     }
     else {
-        if (worker->s->retries) {
-            /*
-             * A worker came back. So here is where we need to
-             * either reset all params to initial conditions or
-             * apply some sort of aging
-             */
+        /*
+         * The worker is in error likely done by a different thread / processi
+         * e.g. for a timeout or bad status. We should respect this and should
+         * not continue with a connection via this worker even if we got one.
+         */
+        if (connected) {
+            apr_socket_close(conn->sock );
+            conn->sock = NULL;
         }
-        worker->s->error_time = 0;
-        worker->s->retries = 0;
+        return DECLINED;
     }
-    return connected ? OK : DECLINED;
 }

 static apr_status_t connection_shutdown(void *theconn)


> 
>> Another question is if we shouldn't do
>>
>> worker->s->error_time = apr_time_now();
>>
>> also in case the worker is already in error state to restart the retry clock as we
just faced an error with connecting
>> to the backend.
> 
> Isn't this already handled by the thread that put the worker in error
> during the race? If there's no race, we already have this line of code
> for the main path.
> 

Valid point, but we might have an error situation here that is more recent then the one by
the thread that put it into
error. OTOH as I suspect a race the difference is likely to be rather slim, so it might not
be worth the effort.

Regards

RĂ¼diger

Mime
View raw message