httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruediger Pluem <rpl...@apache.org>
Subject Re: mod_proxy and balancer problems
Date Sat, 07 Oct 2006 20:11:13 GMT


On 10/05/2006 11:07 AM, Mathias Herberts wrote:
> As I mentioned a few days ago here are my patches for mod_proxy.
> 
> They modify mod_proxy behaviour in several ways to better fit our
> production environment where we heavily use Apache and Tomcat.
> 
> Just so you understand our context, we use a pool of Apache servers as
> front-ends to Tomcat application servers. Each Apache server serves
> several virtual hosts which connect (using AJP 1.3) to backend Tomcat
> servers.
> 
> The version of Apache we use is 2.2 and the version of Tomcat 5.5.

Thank you very much for submitting the patches. Some general remarks:

1. Please create your patches against trunk. All patches need to be applied
   to trunk first and can be backported later to the stable branch(es).
   There are only very rare conditions where the code in trunk has changed
   that much that a change cannot be applied to trunk first before it
   gets applied to a branch.
   This also helps you noticing what things have already been done. E.g.
   the "Retry a worker in error state before using the redirect worker" is
   already implemented in trunk. It is even already backported to the 2.2.x
   branch and will be part of 2.2.4.
   If you detect a change in trunk that has not been backported and that you
   consider useful for a stable branch you can ask for backporting it here on
   the list.

2. Please split the patches in a way that each patch contains only one logical
   change / feature / thing. E.g. the balancer.patch file contains two things:

   1. Allow load factor 0
   2. Retry a worker in error state before using the redirect worker

   This has the benefit that patches are easier to review and things that are
   undisputed can be applied sooner and independent from other things.

3. Please consider the code style guide (http://httpd.apache.org/dev/styleguide.html)
   when writing patches. Your patches violate this guide especially by using tabs
   instead of spaces and some other smaller things (some missing spaces, wrong indentations).

4. Don't feel discouraged by the comments above :-). Although these may look like some
   stupid things to torture new contributors in fact they are not. As the developers with
   commit access are all volunteers, time is one of the most valuable resources here.
   The priority with which developers review patches also depends heavily on the ease of
   a possible review and commit of this patch. As both sides (the project) and the contributor
   benefit from a contemporary review it makes sense to prepare this review from contributors
   side in a way that it is effective.

Now some more specific comments:

ajp.patch:

1.

+    else if (!strcasecmp(key,"forceclose")) {
+»······/* Do not reuse backend connections */
+       if (!strcasecmp(val, "on"))
+           worker->force_close = 1;
+       else if (!strcasecmp(val, "off"))
+           worker->sticky_force = 0;

Guess that should be worker->force_close = 0 above.

2.

+++ ./modules/proxy/mod_proxy.h»2006-09-13 17:58:45.000000000 +0200

@@ -289,6 +292,9 @@
                                  * may be available while exceeding the soft limit */
     apr_interval_time_t timeout; /* connection timeout */
     char                timeout_set;
+    apr_interval_time_t connect_timeout; /* connection timeout */
+    char                connect_timeout_set;
+    int »······»·······force_close; /* discard backend connection after use
*/
     apr_interval_time_t acquire; /* acquire timeout when the maximum number of connections
is exceeded */
     char                acquire_set;
     apr_size_t          recv_buffer_size;
@@ -323,6 +329,7 @@
     apr_array_header_t *workers; /* array of proxy_workers */
     const char *name;            /* name of the load balancer */
     const char *sticky;          /* sticky session identifier */
+    int»·······»·······sticky_case;»··· /* Ignore case when looking for
sticky */
     int         sticky_force;    /* Disable failover for sticky sessions */
     apr_interval_time_t timeout; /* Timeout for waiting on free connection */
     int                 max_attempts; /* Number of attempts before failing */


As already mentioned by someone else: If you add new members to a struct that is part of
the public API, always add them to the end of this struct. Although this leads to a
separation of related struct members over the time this is required for compatibility.
Within a stable branch binary compatibilty is guaranteed. Changes that break this compatibility
cannot be backported from trunk to a stable branch.
There is a versioning scheme for this: The MODULE_MAGIC_NUMBER_MAJOR and the
MODULE_MAGIC_NUMBER_MINOR which can be found in ap_mmn.h. Breaking binary compatibility requires
a change of the MODULE_MAGIC_NUMBER_MAJOR (a major bump) whereas things that do not break
binary
compatibility but require third party sources that want to use an added feature of the API
(like the new struct members) to require a certain minimum version of the product require
only
a change of the MODULE_MAGIC_NUMBER_MINOR (a minor bump). In the case of a major bump the
MODULE_MAGIC_NUMBER_MINOR gets reseted.

3.

--- ./modules/proxy/mod_proxy_ajp.c.orig»·······2006-09-18 22:01:53.000000000 +0200
+++ ./modules/proxy/mod_proxy_ajp.c»····2006-09-13 17:52:48.000000000 +0200
@@ -510,7 +510,7 @@
     }
·
     backend->is_ssl = 0;
-    backend->close_on_recycle = 0;
+    backend->close_on_recycle = worker->force_close;

close_on_recycle is deprecated. Please use close instead.

4. Apart for the comments by others on the case insensitive session parameter I do not
   think that the session parameter itself should be set via stickycasesession.
   I think this should be a parameter that turns on/off case sensitive / case insensitive
   compares.

5. I see no need for the additional directive ProxyConnectTimeout.
   connecttimeout seems to be a worker property to me which should have a reasonable default
   if not set for a worker.

6. In the light of 5. this comment might look odd, but to be honest I do not get what
   you can do via connecttimeout. I try to analyse the situation when you have your Tomcat
   backend on a Linux (checked on a 2.4 kernel, may vary with 2.6 kernels) system:

   First I rule out two factors for long connection times:

   1. A bad network.
   2. A backend box that is that busy that the OS cannot process TCP packets in a timely manner.

   So what remains:

   If the backlog (however set, either OS default or via a dedicated parameter in a patched
Tomcat)
   is full we have three possibilities:

   1. The tcp syn backlog (set via /proc/sys/net/ipv4/tcp_max_syn_backlog) still has free
capacities:
      In this case the OS still reponds to the SYN packages from the reverse proxy with SYN-ACK
packages
      but silently ignores the answering ACK package from the reverse proxy. Nevertheless
the reverse
      proxy thinks that it is connected, so the connect call is left. --> No long connection
time

   2. The tcp syn backlog is full. In this case it depends whether the OS has syn cookies
enabled or
      not (/proc/sys/net/ipv4/tcp_syncookies). If yes we have the same situation reagarding
the connection
      time as in one (no long connection time) in the other case the OS drops the SYN packages
from the
      reverse proxy and we have a long connection time. Given the fact that tcp_max_syn_backlog
can be only
      set globally and its default is rather larger (1024) I do not think that we get into
this situation
      frequently.

   3. /proc/sys/net/ipv4/tcp_abort_on_overflow is set. In this case connections that do not
fit into the
      backlog of the socket have a RST package sent immediately after the ACK package from
the client arrives.
      In this case we also have no long connection time.

   As the behaviour above may be not true for other OS'es except Linux connecttimeout might
still make sense.
   Thus I wrote 5. :-).


balancer.patch:

1. As mentioned before the "Retry a worker in error state before using the redirect worker"
feature
   is already implemented.

2. I agree with Jim that using loadfactor 0 is a misuse of the loadfactor. As mentioned before
   in a different mail by me, we should define a state of a worker (whatever name it has,
   I don't want to go back into the disabled / stopped arguing :-)) that lets a worker only
accept
   requests that match its route and nothing else.

So what is the summary of all this above? Please resubmit your patches and we go into the
next round
of review.

Regards

Rüdiger

Mime
View raw message