From Ruediger Pluem <>
Subject error_time reset in proxy_util.c
Date Thu, 05 Mar 2015 16:14:42 GMT
I am currently hunting down an issue where a balancer member that is set to error is reused
before the retry time runs out.
I think the reason is some race condition around line 2900 in proxy_util.c

     * Put the entire worker to error state if
     * the PROXY_WORKER_IGNORE_ERRORS flag is not set.
     * Altrough some connections may be alive
     * no further connections to the worker could be made
    if (!connected && PROXY_WORKER_IS_USABLE(worker) &&
        !(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
        worker->s->error_time = apr_time_now();
        worker->s->status |= PROXY_WORKER_IN_ERROR;
        ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(00959)
            "ap_proxy_connect_backend disabling worker for (%s) for %"
            APR_TIME_T_FMT "s",
            worker->s->hostname, apr_time_sec(worker->s->retry));
    else {
        if (worker->s->retries) {
             * A worker came back. So here is where we need to
             * either reset all params to initial conditions or
             * apply some sort of aging
        worker->s->error_time = 0;
        worker->s->retries = 0;

I suspect that the worker was already set to error by a parallel thread / process and hence
PROXY_WORKER_IS_USABLE(worker) is false and causes worker->s->error_time to be reset
which causes the worker to be open
for retry immediately. This has been the case since r104624
( 10,5 years ago and the commit
messages offers no hint at
least to be why we reset these values.
Can anybody think of a good reason why we do this?
Another question is if we shouldn't do

worker->s->error_time = apr_time_now();

also in case the worker is already in error state to restart the retry clock as we just faced
an error with connecting
to the backend.



