Return-Path: Delivered-To: apmail-httpd-cvs-archive@www.apache.org Received: (qmail 30870 invoked from network); 22 Apr 2008 17:45:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Apr 2008 17:45:49 -0000 Received: (qmail 64476 invoked by uid 500); 22 Apr 2008 17:45:49 -0000 Delivered-To: apmail-httpd-cvs-archive@httpd.apache.org Received: (qmail 64433 invoked by uid 500); 22 Apr 2008 17:45:49 -0000 Mailing-List: contact cvs-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list cvs@httpd.apache.org Received: (qmail 64422 invoked by uid 99); 22 Apr 2008 17:45:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Apr 2008 10:45:49 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO eris.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Apr 2008 17:45:04 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id D3C3F1A9832; Tue, 22 Apr 2008 10:45:17 -0700 (PDT) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r650591 - /httpd/httpd/branches/2.2.x/docs/manual/rewrite/rewrite_guide_advanced.html.en Date: Tue, 22 Apr 2008 17:45:17 -0000 To: cvs@httpd.apache.org From: noodl@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20080422174517.D3C3F1A9832@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: noodl Date: Tue Apr 22 10:45:14 2008 New Revision: 650591 URL: http://svn.apache.org/viewvc?rev=650591&view=rev Log: Update transformations Modified: httpd/httpd/branches/2.2.x/docs/manual/rewrite/rewrite_guide_advanced.html.en Modified: httpd/httpd/branches/2.2.x/docs/manual/rewrite/rewrite_guide_advanced.html.en URL: http://svn.apache.org/viewvc/httpd/httpd/branches/2.2.x/docs/manual/rewrite/rewrite_guide_advanced.html.en?rev=650591&r1=650590&r2=650591&view=diff ============================================================================== --- httpd/httpd/branches/2.2.x/docs/manual/rewrite/rewrite_guide_advanced.html.en (original) +++ httpd/httpd/branches/2.2.x/docs/manual/rewrite/rewrite_guide_advanced.html.en Tue Apr 22 10:45:14 2008 @@ -31,7 +31,7 @@
ATTENTION: Depending on your server configuration it may be necessary to adjust the examples for your - situation, e.g., adding the [PT] flag if + situation, e.g., adding the [PT] flag if using mod_alias and mod_userdir, etc. Or rewriting a ruleset to work in .htaccess context instead @@ -43,7 +43,7 @@
top
-

Redirect Failing URLs to Another Webserver

+

Redirect Failing URLs to Another Web Server

@@ -364,7 +364,7 @@ The result is that this will work for all types of URLs and is safe. But it does have a performance impact on the web server, because for every request there is one - more internal subrequest. So, if your webserver runs on a + more internal subrequest. So, if your web server runs on a powerful CPU, use this one. If it is a slow machine, use the first approach or better an ErrorDocument CGI script.

@@ -382,10 +382,10 @@

Do you know the great CPAN (Comprehensive Perl Archive Network) under http://www.perl.com/CPAN? - This does a redirect to one of several FTP servers around - the world which each carry a CPAN mirror and (theoretically) - near the requesting client. Actually this - can be called an FTP access multiplexing service. + CPAN automatically redirects browsers to one of many FTP + servers around the world (generally one near the requesting + client); each server carries a full CPAN mirror. This is + effectively an FTP access multiplexing service. CPAN runs via CGI scripts, but how could a similar approach be implemented via mod_rewrite?

@@ -435,7 +435,7 @@

At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent - content, i.e., one has to provide one version for + content, i.e., one has to provide one version for current browsers, a different version for the Lynx and text-mode browsers, and another for other browsers.

@@ -477,25 +477,25 @@
Description:
-

Assume there are nice webpages on remote hosts we want +

Assume there are nice web pages on remote hosts we want to bring into our namespace. For FTP servers we would use the mirror program which actually maintains an explicit up-to-date copy of the remote data on the local - machine. For a webserver we could use the program + machine. For a web server we could use the program webcopy which runs via HTTP. But both - techniques have one major drawback: The local copy is - always just as up-to-date as the last time we ran the program. It - would be much better if the mirror is not a static one we + techniques have a major drawback: The local copy is + always only as up-to-date as the last time we ran the program. It + would be much better if the mirror was not a static one we have to establish explicitly. Instead we want a dynamic - mirror with data which gets updated automatically when - there is need (updated on the remote host).

+ mirror with data which gets updated automatically + as needed on the remote host(s).

Solution:
-

To provide this feature we map the remote webpage or even - the complete remote webarea to our namespace by the use +

To provide this feature we map the remote web page or even + the complete remote web area to our namespace by the use of the Proxy Throughput feature (flag [P]):

@@ -546,22 +546,22 @@

This is a tricky way of virtually running a corporate - (external) Internet webserver + (external) Internet web server (www.quux-corp.dom), while actually keeping - and maintaining its data on a (internal) Intranet webserver + and maintaining its data on an (internal) Intranet web server (www2.quux-corp.dom) which is protected by a - firewall. The trick is that on the external webserver we - retrieve the requested data on-the-fly from the internal + firewall. The trick is that the external web server retrieves + the requested data on-the-fly from the internal one.

Solution:
-

First, we have to make sure that our firewall still - protects the internal webserver and that only the - external webserver is allowed to retrieve data from it. - For a packet-filtering firewall we could for instance +

First, we must make sure that our firewall still + protects the internal web server and only the + external web server is allowed to retrieve data from it. + On a packet-filtering firewall, for instance, we could configure a firewall ruleset like the following:

@@ -601,18 +601,18 @@
         
Solution:
-

There are a lot of possible solutions for this problem. - We will discuss first a commonly known DNS-based variant - and then the special one with mod_rewrite:

+

There are many possible solutions for this problem. + We will first discuss a common DNS-based method, + and then one based on mod_rewrite:

  1. DNS Round-Robin

    The simplest method for load-balancing is to use - the DNS round-robin feature of BIND. + DNS round-robin. Here you just configure www[0-9].foo.com - as usual in your DNS with A(address) records, e.g.,

    + as usual in your DNS with A (address) records, e.g.,

     www0   IN  A       1.2.3.1
    @@ -623,7 +623,7 @@
     www5   IN  A       1.2.3.6
     
    -

    Then you additionally add the following entry:

    +

    Then you additionally add the following entries:

     www   IN  A       1.2.3.1
    @@ -635,17 +635,19 @@
     
                   

    Now when www.foo.com gets resolved, BIND gives out www0-www5 - - but in a slightly permutated/rotated order every time. + - but in a permutated (rotated) order every time. This way the clients are spread over the various servers. But notice that this is not a perfect load - balancing scheme, because DNS resolution information - gets cached by the other nameservers on the net, so + balancing scheme, because DNS resolutions are + cached by clients and other nameservers, so once a client has resolved www.foo.com to a particular wwwN.foo.com, all its - subsequent requests also go to this particular name - wwwN.foo.com. But the final result is - okay, because the requests are collectively - spread over the various webservers.

    + subsequent requests will continue to go to the same + IP (and thus a single server), rather than being + distributed across the other available servers. But the + overall result is + okay because the requests are collectively + spread over the various web servers.

  2. @@ -655,8 +657,8 @@ load-balancing is to use the program lbnamed which can be found at http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. - It is a Perl 5 program in conjunction with auxilliary - tools which provides a real load-balancing for + It is a Perl 5 program which, in conjunction with auxilliary + tools, provides real load-balancing via DNS.

  3. @@ -674,8 +676,8 @@

    entry in the DNS. Then we convert www0.foo.com to a proxy-only server, - i.e., we configure this machine so all arriving URLs - are just pushed through the internal proxy to one of + i.e., we configure this machine so all arriving URLs + are simply passed through its internal proxy to one of the 5 other servers (www1-www5). To accomplish this we first establish a ruleset which contacts a load balancing script lb.pl @@ -716,19 +718,23 @@ www0.foo.com still is overloaded? The answer is yes, it is overloaded, but with plain proxy throughput requests, only! All SSI, CGI, ePerl, etc. - processing is completely done on the other machines. - This is the essential point.

+ processing is handled done on the other machines. + For a complicated site, this may work well. The biggest + risk here is that www0 is now a single point of failure -- + if it crashes, the other servers are inaccessible.
  • - Hardware/TCP Round-Robin + Dedicated Load Balancers -

    There is a hardware solution available, too. Cisco - has a beast called LocalDirector which does a load - balancing at the TCP/IP level. Actually this is some - sort of a circuit level gateway in front of a - webcluster. If you have enough money and really need - a solution with high performance, use this one.

    +

    There are more sophisticated solutions, as well. Cisco, + F5, and several other companies sell hardware load + balancers (typically used in pairs for redundancy), which + offer sophisticated load balancing and auto-failover + features. There are software packages which offer similar + features on commodity hardware, as well. If you have + enough money or need, check these out. The lb-l mailing list is a + good place to research.

  • @@ -744,8 +750,8 @@
    Description:
    -

    On the net there are a lot of nifty CGI programs. But - their usage is usually boring, so a lot of webmaster +

    On the net there are many nifty CGI programs. But + their usage is usually boring, so a lot of webmasters don't use them. Even Apache's Action handler feature for MIME-types is only appropriate when the CGI programs don't need special URLs (actually PATH_INFO @@ -754,9 +760,9 @@ .scgi (for secure CGI) which will be processed by the popular cgiwrap program. The problem here is that for instance if we use a Homogeneous URL Layout - (see above) a file inside the user homedirs has the URL - /u/user/foo/bar.scgi. But - cgiwrap needs the URL in the form + (see above) a file inside the user homedirs might have a URL + like /u/user/foo/bar.scgi, but + cgiwrap needs URLs in the form /~user/foo/bar.scgi/. The following rule solves the problem:

    @@ -770,9 +776,9 @@ access.log for a URL subtree) and wwwidx (which runs Glimpse on a URL subtree). We have to provide the URL area to these - programs so they know on which area they have to act on. - But usually this is ugly, because they are all the times - still requested from that areas, i.e., typically we would + programs so they know which area they are really working with. + But usually this is complicated, because they may still be + requested by the alternate URL form, i.e., typically we would run the swwidx program from within /u/user/foo/ via hyperlink to

    @@ -780,10 +786,10 @@ /internal/cgi/user/swwidx?i=/u/user/foo/
    -

    which is ugly. Because we have to hard-code +

    which is ugly, because we have to hard-code both the location of the area and the location of the CGI inside the - hyperlink. When we have to reorganize the area, we spend a + hyperlink. When we have to reorganize, we spend a lot of time changing the various hyperlinks.

    @@ -829,12 +835,12 @@

    Here comes a really esoteric feature: Dynamically - generated but statically served pages, i.e., pages should be + generated but statically served pages, i.e., pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated - dynamically by the webserver if missing. This way you can - have CGI-generated pages which are statically served unless - one (or a cronjob) removes the static contents. Then the + dynamically by the web server if missing. This way you can + have CGI-generated pages which are statically served unless an + admin (or a cron job) removes the static contents. Then the contents gets refreshed.

    @@ -848,16 +854,16 @@ RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L] -

    Here a request to page.html leads to a +

    Here a request for page.html leads to an internal run of a corresponding page.cgi if - page.html is still missing or has filesize + page.html is missing or has filesize null. The trick here is that page.cgi is a - usual CGI script which (additionally to its STDOUT) + CGI script which (additionally to its STDOUT) writes its output to the file page.html. - Once it was run, the server sends out the data of + Once it has completed, the server sends out page.html. When the webmaster wants to force - a refresh the contents, he just removes - page.html (usually done by a cronjob).

    + a refresh of the contents, he just removes + page.html (typically from cron).

    @@ -871,9 +877,9 @@
    Description:
    -

    Wouldn't it be nice while creating a complex webpage if - the webbrowser would automatically refresh the page every - time we write a new version from within our editor? +

    Wouldn't it be nice, while creating a complex web page, if + the web browser would automatically refresh the page every + time we save a new version from within our editor? Impossible?

    @@ -881,10 +887,10 @@

    No! We just combine the MIME multipart feature, the - webserver NPH feature and the URL manipulation power of + web server NPH feature, and the URL manipulation power of mod_rewrite. First, we establish a new URL feature: Adding just :refresh to any - URL causes this to be refreshed every time it gets + URL causes the 'page' to be refreshed every time it is updated on the filesystem.

    @@ -1024,18 +1030,17 @@
     
             

    The <VirtualHost> feature of Apache is nice - and works great when you just have a few dozens + and works great when you just have a few dozen virtual hosts. But when you are an ISP and have hundreds of - virtual hosts to provide this feature is not the best - choice.

    + virtual hosts, this feature is suboptimal.

    Solution:
    -

    To provide this feature we map the remote webpage or even - the complete remote webarea to our namespace by the use - of the Proxy Throughput feature (flag [P]):

    +

    To provide this feature we map the remote web page or even + the complete remote web area to our namespace using the + Proxy Throughput feature (flag [P]):

     ##
    @@ -1173,7 +1178,7 @@
             

    We first have to make sure mod_rewrite is below(!) mod_proxy in the Configuration - file when compiling the Apache webserver. This way it gets + file when compiling the Apache web server. This way it gets called before mod_proxy. Then we configure the following for a host-dependent deny...

    @@ -1201,11 +1206,11 @@
    Description:
    -

    Sometimes a very special authentication is needed, for - instance a authentication which checks for a set of +

    Sometimes very special authentication is needed, for + instance authentication which checks for a set of explicitly configured users. Only these should receive access and without explicit prompting (which would occur - when using the Basic Auth via mod_auth).

    + when using Basic Auth via mod_auth_basic).

    Solution: