httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruediger Pluem <rpl...@apache.org>
Subject Re: svn commit: r711886 - /httpd/httpd/branches/2.2.x/STATUS
Date Thu, 06 Nov 2008 20:58:52 GMT


On 11/06/2008 05:07 PM, jim@apache.org wrote:
> Author: jim
> Date: Thu Nov  6 08:06:54 2008
> New Revision: 711886
> 
> URL: http://svn.apache.org/viewvc?rev=711886&view=rev
> Log:
> Hold off until I learn more from Ruediger
> 
> Modified:
>     httpd/httpd/branches/2.2.x/STATUS
> 
> Modified: httpd/httpd/branches/2.2.x/STATUS
> URL: http://svn.apache.org/viewvc/httpd/httpd/branches/2.2.x/STATUS?rev=711886&r1=711885&r2=711886&view=diff
> ==============================================================================
> --- httpd/httpd/branches/2.2.x/STATUS (original)
> +++ httpd/httpd/branches/2.2.x/STATUS Thu Nov  6 08:06:54 2008
> @@ -135,7 +135,7 @@
>          http://svn.apache.org/viewvc?rev=706921&view=rev
>       Backport version for 2.2.x of patch:
>          Trunk version of patch works
> -     +1: jim
> +     +1: 
>       rpluem: This code still has bad edge cases that lead to segfaults. I have
>       a better version in the pipe that currently undergoes testing and will
>       be in trunk soon.

After analysing some crashes on one of my systems I came to the conclusion that the
current approach to solve the problem still has open edge cases of which some IMHO
could not be fixed this way. Furthermore the already complex flushing code would become
even more complex and harder to understand. So I chose a new approach to solve the problem.

What is the problem at all?

mod_proxy_http uses a a conn_rec to communicate with the backend. It somehow reverses
the meaning of input and output filters and uses them to send the request and receive
the response. In order to use persistent SSL connections to the backend it is needed
to keep this conn_rec (and thus its bucket allocator) alive and tied to the backend
connection. Thus it is no longer destroyed at the end of a client request / connection.
So the conn_rec and especially the bucket allocator for the backend have a lifecycle
that is disjoint from the one to the client. Especially it can happen that due to e.g.
a faulty backend connection the conn_rec to the backend *lives* shorter than
buckets that are still buffered somewhere in the output filter chain to the client
(see comments in removed code below). This causes segfaults when these buckets are
accessed then or if brigades that contain them get cleaned up later on due to a parent
pool cleanup.

The new approach to solve the problem is to transform the lifetime of the buckets
by shifting them to a different allocator. The basic approach is to iterate over
the brigade fetched from the backend connection and

for metabuckets recreate them
for data buckets read them and put the read data in a transient bucket.

Currently only HEAP, TRANSIENT, POOL and IMMORTAL data buckets are expected from the backend.
In this case apr_bucket_read is a trivial operation that only returns a pointer to the
data stored in the bucket. Once we pass the buckets down the chain filters are responsible
to set the buckets aside if they do not consume them in this filter pass. Setting aside
a transient bucket causes the data to be moved into a freshly allocated memory area that
is allocated from the bucket allocator, in this case the correct one as it is the one from
the connection to the client. This way effort intensive memory copying only happens, when
really needed. A big plus of this approach is that it opens the door again for returning
the backend connection back to the connection pool as soon as we have received all data
from the backend instead of the need to wait until all data is flushed to the client.
I guess this will make some people happy here :-).

Well here we go:


Index: modules/proxy/proxy_util.c
===================================================================
--- modules/proxy/proxy_util.c	(revision 711224)
+++ modules/proxy/proxy_util.c	(working copy)
@@ -1628,9 +1628,6 @@
 {
     proxy_conn_rec *conn = (proxy_conn_rec *)theconn;
     proxy_worker *worker = conn->worker;
-    apr_bucket_brigade *bb;
-    conn_rec *c;
-    request_rec *r;

     /*
      * If the connection pool is NULL the worker
@@ -1651,112 +1648,6 @@
     }
 #endif

-    r = conn->r;
-    if (conn->need_flush && r) {
-        request_rec *main_request;
-
-        /*
-         * We need to get to the main request as subrequests do not count any
-         * sent bytes.
-         */
-        main_request = r;
-        while (main_request->main) {
-            main_request = main_request->main;
-        }
-        if (main_request->bytes_sent || r->eos_sent) {
-            ap_filter_t *flush_filter;
-
-            /*
-             * We need to ensure that buckets that may have been buffered in the
-             * network filters get flushed to the network. This is needed since
-             * these buckets have been created with the bucket allocator of the
-             * backend connection. This allocator either gets destroyed if
-             * conn->close is set or the worker address is not reusable which
-             * causes the connection to the backend to be closed or it will be
-             * used again by another frontend connection that wants to recycle
-             * the backend connection.
-             * In this case we could run into nasty race conditions (e.g. if the
-             * next user of the backend connection destroys the allocator before
-             * we sent the buckets to the network).
-             *
-             * Remark 1: Only do this if buckets were sent down the chain before
-             * that could still be buffered in the network filter. This is the
-             * case if we have sent an EOS bucket or if we actually sent buckets
-             * with data down the chain. In all other cases we either have not
-             * sent any buckets at all down the chain or we only sent meta
-             * buckets that are not EOS buckets down the chain. The only meta
-             * bucket that remains in this case is the flush bucket which would
-             * have removed all possibly buffered buckets in the network filter.
-             * If we sent a flush bucket in the case where not ANY buckets were
-             * sent down the chain, we break error handling which happens AFTER
-             * us.
-             *
-             * Remark 2: Doing a setaside does not help here as the bucketsi
-             * remain created by the wrong allocator in this case.
-             *
-             * Remark 3: Yes, this creates a possible performance penalty in the
-             * case of pipelined requests as we may send only a small amount of
-             * data over the wire.
-             *
-             * XXX: There remains a nasty edge case with detecting whether data
-             * was sent down the chain: If the data sent down the chain was
-             * set aside by some filter in the chain r->bytes_sent will still
-             * be zero albeit there are buckets in the chain that we would
-             * need to flush. Adding our own filter at the begining of the
-             * chain for detecting the passage of data buckets does not help
-             * as well because calling ap_set_content_type can cause such
-             * filters to be added *before* our filter in the chain and thus
-             * have it cut off from this information.
-             */
-            c = r->connection;
-            bb = apr_brigade_create(r->pool, c->bucket_alloc);
-            if (r->eos_sent) {
-                if (r->main) {
-                    /*
-                     * If we are a subrequest the EOS bucket went up the filter
-                     * chain up to the SUBREQ_CORE filter which drops it
-                     * silently. So we need to sent our flush bucket through
-                     * all the filters which haven't seen the EOS bucket. These
-                     * are the filters after the SUBREQ_CORE filter.
-                     * All filters after the SUBREQ_CORE filter are
-                     *
-                     * Either not connection oriented, so filter->r == NULL or
-                     *
-                     * filter->r != r since they belong to a different request
-                     * (our parent request).
-                     */
-                    flush_filter = r->output_filters;
-                    /*
-                     * We do not need to check if flush_filter->next != NULL
-                     * because there is at least the core output filter which
-                     * is a network filter and thus flush_filter->r == NULL
-                     * which is != r.
-                     */
-                    while (flush_filter->r == r) {
-                        flush_filter = flush_filter->next;
-                    }
-                }
-                else {
-                    /*
-                     * If we are a main request and if we have already sent an
-                     * EOS bucket send directly to the connection based
-                     * filters. We just want to flush the buckets
-                     * if something hasn't been sent to the network yet and
-                     * the request based filters in the chain may not work
-                     * any longer once they have seen an EOS bucket.
-                     */
-                    flush_filter = c->output_filters;
-                }
-            }
-            else {
-                flush_filter = r->output_filters;
-            }
-            ap_fflush(flush_filter, bb);
-            apr_brigade_destroy(bb);
-            conn->r = NULL;
-        }
-    }
-
     /* determine if the connection need to be closed */
     if (conn->close || !worker->is_address_reusable || worker->disablereuse) {
         apr_pool_t *p = conn->pool;
@@ -2138,8 +2029,6 @@
     apr_status_t err = APR_SUCCESS;
     apr_status_t uerr = APR_SUCCESS;

-    conn->r = r;
-
     /*
      * Break up the URL to determine the host to connect to
      */
@@ -2478,12 +2367,6 @@
         return OK;
     }

-    /*
-     * We need to flush the buckets before we return the connection to the
-     * connection pool. See comment in connection_cleanup for why this is
-     * needed.
-     */
-    conn->need_flush = 1;
     bucket_alloc = apr_bucket_alloc_create(conn->scpool);
     /*
      * The socket is now open, create a new backend server connection
@@ -2576,3 +2459,58 @@
     e = apr_bucket_eos_create(c->bucket_alloc);
     APR_BRIGADE_INSERT_TAIL(brigade, e);
 }
+
+/*
+ * Transform buckets from one bucket allocator to another one by creating a
+ * transient bucket for each data bucket and let it use the data read from
+ * the old bucket. Metabuckets are transformed by just recreating them.
+ * Attention: Currently only the following bucket types are handled:
+ *
+ * HEAP
+ * TRANSIENT
+ * POOL
+ * IMMORTAL
+ * FLUSH
+ * EOS
+ *
+ * If an other bucket type is found its type is logged as a debug message
+ * and APR_EGENERAL is returned.
+ */
+PROXY_DECLARE(apr_status_t)
+ap_proxy_buckets_lifetime_transform(request_rec *r, apr_bucket_brigade *from,
+                                    apr_bucket_brigade *to)
+{
+    apr_bucket *e;
+    apr_bucket *new;
+    const char *data;
+    apr_size_t bytes;
+    apr_status_t rv = APR_SUCCESS;
+
+    apr_brigade_cleanup(to);
+    for (e = APR_BRIGADE_FIRST(from);
+         e != APR_BRIGADE_SENTINEL(from);
+         e = APR_BUCKET_NEXT(e)) {
+        if (APR_BUCKET_IS_HEAP(e) || APR_BUCKET_IS_TRANSIENT(e)
+            || APR_BUCKET_IS_IMMORTAL(e) || APR_BUCKET_IS_POOL(e)) {
+            apr_bucket_read(e, &data, &bytes, APR_BLOCK_READ);
+            new = apr_bucket_transient_create(data, bytes, r->connection->bucket_alloc);
+            APR_BRIGADE_INSERT_TAIL(to, new);
+        }
+        else if (APR_BUCKET_IS_FLUSH(e)) {
+            new = apr_bucket_flush_create(r->connection->bucket_alloc);
+            APR_BRIGADE_INSERT_TAIL(to, new);
+        }
+        else if (APR_BUCKET_IS_EOS(e)) {
+            new = apr_bucket_eos_create(r->connection->bucket_alloc);
+            APR_BRIGADE_INSERT_TAIL(to, new);
+        }
+        else {
+            ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r,
+                          "proxy: Unhandled bucket type of type %s in"
+                          " ap_proxy_buckets_lifetime_transform", e->type->name);
+            rv = APR_EGENERAL;
+        }
+    }
+    return rv;
+}
+
Index: modules/proxy/mod_proxy_http.c
===================================================================
--- modules/proxy/mod_proxy_http.c	(revision 711224)
+++ modules/proxy/mod_proxy_http.c	(working copy)
@@ -1335,6 +1335,7 @@
     request_rec *rp;
     apr_bucket *e;
     apr_bucket_brigade *bb, *tmp_bb;
+    apr_bucket_brigade *pass_bb;
     int len, backasswards;
     int interim_response = 0; /* non-zero whilst interim 1xx responses
                                * are being read. */
@@ -1347,6 +1348,7 @@
     const char *te = NULL;

     bb = apr_brigade_create(p, c->bucket_alloc);
+    pass_bb = apr_brigade_create(p, c->bucket_alloc);

     /* Get response from the remote server, and pass it up the
      * filter chain
@@ -1765,6 +1767,9 @@
                         break;
                     }

+                    /* Switch the allocator lifetime of the buckets */
+                    ap_proxy_buckets_lifetime_transform(r, bb, pass_bb);
+
                     /* found the last brigade? */
                     if (APR_BUCKET_IS_EOS(APR_BRIGADE_LAST(bb))) {
                         /* signal that we must leave */
@@ -1772,7 +1777,7 @@
                     }

                     /* try send what we read */
-                    if (ap_pass_brigade(r->output_filters, bb) != APR_SUCCESS
+                    if (ap_pass_brigade(r->output_filters, pass_bb) != APR_SUCCESS
                         || c->aborted) {
                         /* Ack! Phbtt! Die! User aborted! */
                         backend->close = 1;  /* this causes socket close below */
@@ -1781,6 +1786,7 @@

                     /* make sure we always clean up after ourselves */
                     apr_brigade_cleanup(bb);
+                    apr_brigade_cleanup(pass_bb);

                 } while (!finish);
             }
Index: modules/proxy/mod_proxy.h
===================================================================
--- modules/proxy/mod_proxy.h	(revision 711224)
+++ modules/proxy/mod_proxy.h	(working copy)
@@ -741,6 +737,32 @@
 PROXY_DECLARE(void) ap_proxy_backend_broke(request_rec *r,
                                            apr_bucket_brigade *brigade);

+/**
+ * Transform buckets from one bucket allocator to another one by creating a
+ * transient bucket for each data bucket and let it use the data read from
+ * the old bucket. Metabuckets are transformed by just recreating them.
+ * Attention: Currently only the following bucket types are handled:
+ *
+ * HEAP
+ * TRANSIENT
+ * POOL
+ * IMMORTAL
+ * FLUSH
+ * EOS
+ *
+ * If an other bucket type is found its type is logged as a debug message
+ * and APR_EGENERAL is returned.
+ * @param r    current request record of client request. Only used for logging
+ *             purposes
+ * @param from the brigade that contains the buckets to transform
+ * @param to   the brigade that will receive the transformed buckets
+ * @return     APR_SUCCESS if all buckets could be transformed APR_EGENERAL
+ *             otherwise
+ */
+PROXY_DECLARE(apr_status_t)
+ap_proxy_buckets_lifetime_transform(request_rec *r, apr_bucket_brigade *from,
+                                        apr_bucket_brigade *to);
+
 #define PROXY_LBMETHOD "proxylbmethod"

 /* The number of dynamic workers that can be added when reconfiguring.
Index: modules/proxy/mod_proxy_ftp.c
===================================================================
--- modules/proxy/mod_proxy_ftp.c	(revision 711224)
+++ modules/proxy/mod_proxy_ftp.c	(working copy)
@@ -969,7 +969,6 @@
         }
         /* TODO: see if ftp could use determine_connection */
         backend->addr = connect_addr;
-        backend->r = r;
         ap_set_module_config(c->conn_config, &proxy_ftp_module, backend);
     }

Index: include/ap_mmn.h
===================================================================
--- include/ap_mmn.h	(revision 711224)
+++ include/ap_mmn.h	(working copy)
@@ -174,13 +174,16 @@
  * 20081101.0 (2.3.0-dev)  Remove unused AUTHZ_GROUP_NOTE define.
  * 20081102.0 (2.3.0-dev)  Remove authz_provider_list, authz_request_state,
  *                         and AUTHZ_ACCESS_PASSED_NOTE.
+ * 20081104.0 (2.3.0-dev)  Remove r and need_flush fields from proxy_conn_rec
+ *                         as they are no longer used and add
+ *                         ap_proxy_buckets_lifetime_transform to mod_proxy.h
  *
  */

 #define MODULE_MAGIC_COOKIE 0x41503234UL /* "AP24" */

 #ifndef MODULE_MAGIC_NUMBER_MAJOR
-#define MODULE_MAGIC_NUMBER_MAJOR 20081102
+#define MODULE_MAGIC_NUMBER_MAJOR 20081104
 #endif
 #define MODULE_MAGIC_NUMBER_MINOR 0                     /* 0...n */



Regards

RĂ¼diger





Mime
View raw message