Return-Path: Delivered-To: apmail-httpd-cvs-archive@www.apache.org Received: (qmail 98778 invoked from network); 2 Nov 2009 20:37:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Nov 2009 20:37:09 -0000 Received: (qmail 58081 invoked by uid 500); 2 Nov 2009 20:37:08 -0000 Delivered-To: apmail-httpd-cvs-archive@httpd.apache.org Received: (qmail 58005 invoked by uid 500); 2 Nov 2009 20:37:08 -0000 Mailing-List: contact cvs-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list cvs@httpd.apache.org Received: (qmail 57996 invoked by uid 99); 2 Nov 2009 20:37:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Nov 2009 20:37:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Nov 2009 20:36:56 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 4A3472388998; Mon, 2 Nov 2009 20:36:34 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r832094 [3/4] - /httpd/httpd/trunk/docs/manual/rewrite/ Date: Mon, 02 Nov 2009 20:36:32 -0000 To: cvs@httpd.apache.org From: rbowen@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20091102203634.4A3472388998@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Modified: httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide_advanced.html.en URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide_advanced.html.en?rev=832094&r1=832093&r2=832094&view=diff ============================================================================== --- httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide_advanced.html.en (original) +++ httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide_advanced.html.en Mon Nov 2 20:36:30 2009 @@ -41,1251 +41,11 @@ avoids many problems. -

See also

  • Module + -
    top
    -
    -

    Web Cluster with Consistent URL Space

    - - - -
    -
    Description:
    - -
    -

    We want to create a homogeneous and consistent URL - layout across all WWW servers on an Intranet web cluster, i.e., - all URLs (by definition server-local and thus - server-dependent!) become server independent! - What we want is to give the WWW namespace a single consistent - layout: no URL should refer to - any particular target server. The cluster itself - should connect users automatically to a physical target - host as needed, invisibly.

    -
    - -
    Solution:
    - -
    -

    First, the knowledge of the target servers comes from - (distributed) external maps which contain information on - where our users, groups, and entities reside. They have the - form:

    - -
    -user1  server_of_user1
    -user2  server_of_user2
    -:      :
    -
    - -

    We put them into files map.xxx-to-host. - Second we need to instruct all servers to redirect URLs - of the forms:

    - -
    -/u/user/anypath
    -/g/group/anypath
    -/e/entity/anypath
    -
    - -

    to

    - -
    -http://physical-host/u/user/anypath
    -http://physical-host/g/group/anypath
    -http://physical-host/e/entity/anypath
    -
    - -

    when any URL path need not be valid on every server. The - following ruleset does this for us with the help of the map - files (assuming that server0 is a default server which - will be used if a user has no entry in the map):

    - -
    -RewriteEngine on
    -
    -RewriteMap      user-to-host   txt:/path/to/map.user-to-host
    -RewriteMap     group-to-host   txt:/path/to/map.group-to-host
    -RewriteMap    entity-to-host   txt:/path/to/map.entity-to-host
    -
    -RewriteRule   ^/u/([^/]+)/?(.*)   http://${user-to-host:$1|server0}/u/$1/$2
    -RewriteRule   ^/g/([^/]+)/?(.*)  http://${group-to-host:$1|server0}/g/$1/$2
    -RewriteRule   ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2
    -
    -RewriteRule   ^/([uge])/([^/]+)/?$          /$1/$2/.www/
    -RewriteRule   ^/([uge])/([^/]+)/([^.]+.+)   /$1/$2/.www/$3\
    -
    -
    -
    - -
    top
    -
    -

    Structured Homedirs

    - - - -
    -
    Description:
    - -
    -

    Some sites with thousands of users use a - structured homedir layout, i.e. each homedir is in a - subdirectory which begins (for instance) with the first - character of the username. So, /~foo/anypath - is /home/f/foo/.www/anypath - while /~bar/anypath is - /home/b/bar/.www/anypath.

    -
    - -
    Solution:
    - -
    -

    We use the following ruleset to expand the tilde URLs - into the above layout.

    - -
    -RewriteEngine on
    -RewriteRule   ^/~(([a-z])[a-z0-9]+)(.*)  /home/$2/$1/.www$3
    -
    -
    -
    - -
    top
    -
    -

    Filesystem Reorganization

    - - - -
    -
    Description:
    - -
    -

    This really is a hardcore example: a killer application - which heavily uses per-directory - RewriteRules to get a smooth look and feel - on the Web while its data structure is never touched or - adjusted. Background: net.sw is - my archive of freely available Unix software packages, - which I started to collect in 1992. It is both my hobby - and job to do this, because while I'm studying computer - science I have also worked for many years as a system and - network administrator in my spare time. Every week I need - some sort of software so I created a deep hierarchy of - directories where I stored the packages:

    - -
    -drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
    -drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
    -drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
    -drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
    -drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
    -drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
    -drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
    -drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
    -drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
    -drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
    -drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
    -drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
    -drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
    -drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
    -drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
    -drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
    -
    - -

    In July 1996 I decided to make this archive public to - the world via a nice Web interface. "Nice" means that I - wanted to offer an interface where you can browse - directly through the archive hierarchy. And "nice" means - that I didn't want to change anything inside this - hierarchy - not even by putting some CGI scripts at the - top of it. Why? Because the above structure should later be - accessible via FTP as well, and I didn't want any - Web or CGI stuff mixed in there.

    -
    - -
    Solution:
    - -
    -

    The solution has two parts: The first is a set of CGI - scripts which create all the pages at all directory - levels on-the-fly. I put them under - /e/netsw/.www/ as follows:

    - -
    --rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
    -drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
    --rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
    --rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
    --rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
    --rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
    --rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
    --rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
    -drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
    --rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
    --rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
    --rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
    --rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst
    -
    - -

    The DATA/ subdirectory holds the above - directory structure, i.e. the real - net.sw stuff, and gets - automatically updated via rdist from time to - time. The second part of the problem remains: how to link - these two structures together into one smooth-looking URL - tree? We want to hide the DATA/ directory - from the user while running the appropriate CGI scripts - for the various URLs. Here is the solution: first I put - the following into the per-directory configuration file - in the DocumentRoot - of the server to rewrite the public URL path - /net.sw/ to the internal path - /e/netsw:

    - -
    -RewriteRule  ^net.sw$       net.sw/        [R]
    -RewriteRule  ^net.sw/(.*)$  e/netsw/$1
    -
    - -

    The first rule is for requests which miss the trailing - slash! The second rule does the real thing. And then - comes the killer configuration which stays in the - per-directory config file - /e/netsw/.www/.wwwacl:

    - -
    -Options       ExecCGI FollowSymLinks Includes MultiViews
    -
    -RewriteEngine on
    -
    -#  we are reached via /net.sw/ prefix
    -RewriteBase   /net.sw/
    -
    -#  first we rewrite the root dir to
    -#  the handling cgi script
    -RewriteRule   ^$                       netsw-home.cgi     [L]
    -RewriteRule   ^index\.html$            netsw-home.cgi     [L]
    -
    -#  strip out the subdirs when
    -#  the browser requests us from perdir pages
    -RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]
    -
    -#  and now break the rewriting for local files
    -RewriteRule   ^netsw-home\.cgi.*       -                  [L]
    -RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
    -RewriteRule   ^netsw-search\.cgi.*     -                  [L]
    -RewriteRule   ^netsw-tree\.cgi$        -                  [L]
    -RewriteRule   ^netsw-about\.html$      -                  [L]
    -RewriteRule   ^netsw-img/.*$           -                  [L]
    -
    -#  anything else is a subdir which gets handled
    -#  by another cgi script
    -RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
    -RewriteRule   (.*)                     netsw-lsdir.cgi/$1
    -
    - -

    Some hints for interpretation:

    - -
      -
    1. Notice the L (last) flag and no - substitution field ('-') in the fourth part
    2. - -
    3. Notice the ! (not) character and - the C (chain) flag at the first rule - in the last part
    4. - -
    5. Notice the catch-all pattern in the last rule
    6. -
    -
    -
    - -
    top
    -
    -

    Redirect Failing URLs to Another Web Server

    - - - -
    -
    Description:
    - -
    -

    A typical FAQ about URL rewriting is how to redirect - failing requests on webserver A to webserver B. Usually - this is done via ErrorDocument CGI scripts in Perl, but - there is also a mod_rewrite solution. - But note that this performs more poorly than using an - ErrorDocument - CGI script!

    -
    - -
    Solution:
    - -
    -

    The first solution has the best performance but less - flexibility, and is less safe:

    - -
    -RewriteEngine on
    -RewriteCond   %{DOCUMENT_ROOT/%{REQUEST_URI} !-f
    -RewriteRule   ^(.+)                             http://webserverB.dom/$1
    -
    - -

    The problem here is that this will only work for pages - inside the DocumentRoot. While you can add more - Conditions (for instance to also handle homedirs, etc.) - there is a better variant:

    - -
    -RewriteEngine on
    -RewriteCond   %{REQUEST_URI} !-U
    -RewriteRule   ^(.+)          http://webserverB.dom/$1
    -
    - -

    This uses the URL look-ahead feature of mod_rewrite. - The result is that this will work for all types of URLs - and is safe. But it does have a performance impact on - the web server, because for every request there is one - more internal subrequest. So, if your web server runs on a - powerful CPU, use this one. If it is a slow machine, use - the first approach or better an ErrorDocument CGI script.

    -
    -
    - -
    top
    -
    -

    Archive Access Multiplexer

    - - - -
    -
    Description:
    - -
    -

    Do you know the great CPAN (Comprehensive Perl Archive - Network) under http://www.perl.com/CPAN? - CPAN automatically redirects browsers to one of many FTP - servers around the world (generally one near the requesting - client); each server carries a full CPAN mirror. This is - effectively an FTP access multiplexing service. - CPAN runs via CGI scripts, but how could a similar approach - be implemented via mod_rewrite?

    -
    - -
    Solution:
    - -
    -

    First we notice that as of version 3.0.0, - mod_rewrite can - also use the "ftp:" scheme on redirects. - And second, the location approximation can be done by a - RewriteMap - over the top-level domain of the client. - With a tricky chained ruleset we can use this top-level - domain as a key to our multiplexing map.

    - -
    -RewriteEngine on
    -RewriteMap    multiplex                txt:/path/to/map.cxan
    -RewriteRule   ^/CxAN/(.*)              %{REMOTE_HOST}::$1                 [C]
    -RewriteRule   ^.+\.([a-zA-Z]+)::(.*)$  ${multiplex:$1|ftp.default.dom}$2  [R,L]
    -
    - -
    -##
    -##  map.cxan -- Multiplexing Map for CxAN
    -##
    -
    -de        ftp://ftp.cxan.de/CxAN/
    -uk        ftp://ftp.cxan.uk/CxAN/
    -com       ftp://ftp.cxan.com/CxAN/
    - :
    -##EOF##
    -
    -
    -
    - -
    top
    -
    -

    Browser Dependent Content

    - - - -
    -
    Description:
    - -
    -

    At least for important top-level pages it is sometimes - necessary to provide the optimum of browser dependent - content, i.e., one has to provide one version for - current browsers, a different version for the Lynx and text-mode - browsers, and another for other browsers.

    -
    - -
    Solution:
    - -
    -

    We cannot use content negotiation because the browsers do - not provide their type in that form. Instead we have to - act on the HTTP header "User-Agent". The following config - does the following: If the HTTP header "User-Agent" - begins with "Mozilla/3", the page foo.html - is rewritten to foo.NS.html and the - rewriting stops. If the browser is "Lynx" or "Mozilla" of - version 1 or 2, the URL becomes foo.20.html. - All other browsers receive page foo.32.html. - This is done with the following ruleset:

    - -
    -RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/3.*
    -RewriteRule ^foo\.html$         foo.NS.html          [L]
    -
    -RewriteCond %{HTTP_USER_AGENT}  ^Lynx/.*         [OR]
    -RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/[12].*
    -RewriteRule ^foo\.html$         foo.20.html          [L]
    -
    -RewriteRule ^foo\.html$         foo.32.html          [L]
    -
    -
    -
    - -
    top
    -
    -

    Dynamic Mirror

    - - - -
    -
    Description:
    - -
    -

    Assume there are nice web pages on remote hosts we want - to bring into our namespace. For FTP servers we would use - the mirror program which actually maintains an - explicit up-to-date copy of the remote data on the local - machine. For a web server we could use the program - webcopy which runs via HTTP. But both - techniques have a major drawback: The local copy is - always only as up-to-date as the last time we ran the program. It - would be much better if the mirror was not a static one we - have to establish explicitly. Instead we want a dynamic - mirror with data which gets updated automatically - as needed on the remote host(s).

    -
    - -
    Solution:
    - -
    -

    To provide this feature we map the remote web page or even - the complete remote web area to our namespace by the use - of the Proxy Throughput feature - (flag [P]):

    - -
    -RewriteEngine  on
    -RewriteBase    /~quux/
    -RewriteRule    ^hotsheet/(.*)$  http://www.tstimpreso.com/hotsheet/$1  [P]
    -
    - -
    -RewriteEngine  on
    -RewriteBase    /~quux/
    -RewriteRule    ^usa-news\.html$   http://www.quux-corp.com/news/index.html  [P]
    -
    -
    -
    - -
    top
    -
    -

    Reverse Dynamic Mirror

    - - - -
    -
    Description:
    - -
    ...
    - -
    Solution:
    - -
    -
    -RewriteEngine on
    -RewriteCond   /mirror/of/remotesite/$1           -U
    -RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
    -
    -
    -
    - -
    top
    -
    -

    Retrieve Missing Data from Intranet

    - - - -
    -
    Description:
    - -
    -

    This is a tricky way of virtually running a corporate - (external) Internet web server - (www.quux-corp.dom), while actually keeping - and maintaining its data on an (internal) Intranet web server - (www2.quux-corp.dom) which is protected by a - firewall. The trick is that the external web server retrieves - the requested data on-the-fly from the internal - one.

    -
    - -
    Solution:
    - -
    -

    First, we must make sure that our firewall still - protects the internal web server and only the - external web server is allowed to retrieve data from it. - On a packet-filtering firewall, for instance, we could - configure a firewall ruleset like the following:

    - -
    -ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80
    -DENY  Host *                 Port *     --> Host www2.quux-corp.dom Port 80
    -
    - -

    Just adjust it to your actual configuration syntax. - Now we can establish the mod_rewrite - rules which request the missing data in the background - through the proxy throughput feature:

    - -
    -RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2 [C]
    -# REQUEST_FILENAME usage below is correct in this per-server context example 
    -# because the rule that references REQUEST_FILENAME is chained to a rule that
    -# sets REQUEST_FILENAME. 
    -RewriteCond %{REQUEST_FILENAME}       !-f
    -RewriteCond %{REQUEST_FILENAME}       !-d
    -RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P]
    -
    -
    -
    - -
    top
    -
    -

    Load Balancing

    - - - -
    -
    Description:
    - -
    -

    Suppose we want to load balance the traffic to - www.example.com over www[0-5].example.com - (a total of 6 servers). How can this be done?

    -
    - -
    Solution:
    - -
    -

    There are many possible solutions for this problem. - We will first discuss a common DNS-based method, - and then one based on mod_rewrite:

    - -
      -
    1. - DNS Round-Robin - -

      The simplest method for load-balancing is to use - DNS round-robin. - Here you just configure www[0-9].example.com - as usual in your DNS with A (address) records, e.g.,

      - -
      -www0   IN  A       1.2.3.1
      -www1   IN  A       1.2.3.2
      -www2   IN  A       1.2.3.3
      -www3   IN  A       1.2.3.4
      -www4   IN  A       1.2.3.5
      -www5   IN  A       1.2.3.6
      -
      - -

      Then you additionally add the following entries:

      - -
      -www   IN  A       1.2.3.1
      -www   IN  A       1.2.3.2
      -www   IN  A       1.2.3.3
      -www   IN  A       1.2.3.4
      -www   IN  A       1.2.3.5
      -
      - -

      Now when www.example.com gets - resolved, BIND gives out www0-www5 - - but in a permutated (rotated) order every time. - This way the clients are spread over the various - servers. But notice that this is not a perfect load - balancing scheme, because DNS resolutions are - cached by clients and other nameservers, so - once a client has resolved www.example.com - to a particular wwwN.example.com, all its - subsequent requests will continue to go to the same - IP (and thus a single server), rather than being - distributed across the other available servers. But the - overall result is - okay because the requests are collectively - spread over the various web servers.

      -
    2. - -
    3. - DNS Load-Balancing - -

      A sophisticated DNS-based method for - load-balancing is to use the program - lbnamed which can be found at - http://www.stanford.edu/~riepel/lbnamed/. - It is a Perl 5 program which, in conjunction with auxiliary - tools, provides real load-balancing via - DNS.

      -
    4. - -
    5. - Proxy Throughput Round-Robin - -

      In this variant we use mod_rewrite - and its proxy throughput feature. First we dedicate - www0.example.com to be actually - www.example.com by using a single

      - -
      -www    IN  CNAME   www0.example.com.
      -
      - -

      entry in the DNS. Then we convert - www0.example.com to a proxy-only server, - i.e., we configure this machine so all arriving URLs - are simply passed through its internal proxy to one of - the 5 other servers (www1-www5). To - accomplish this we first establish a ruleset which - contacts a load balancing script lb.pl - for all URLs.

      - -
      -RewriteEngine on
      -RewriteMap    lb      prg:/path/to/lb.pl
      -RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
      -
      - -

      Then we write lb.pl:

      - -
      -#!/path/to/perl
      -##
      -##  lb.pl -- load balancing script
      -##
      -
      -$| = 1;
      -
      -$name   = "www";     # the hostname base
      -$first  = 1;         # the first server (not 0 here, because 0 is myself)
      -$last   = 5;         # the last server in the round-robin
      -$domain = "foo.dom"; # the domainname
      -
      -$cnt = 0;
      -while (<STDIN>) {
      -    $cnt = (($cnt+1) % ($last+1-$first));
      -    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
      -    print "http://$server/$_";
      -}
      -
      -##EOF##
      -
      - -
      A last notice: Why is this useful? Seems like - www0.example.com still is overloaded? The - answer is yes, it is overloaded, but with plain proxy - throughput requests, only! All SSI, CGI, ePerl, etc. - processing is handled done on the other machines. - For a complicated site, this may work well. The biggest - risk here is that www0 is now a single point of failure -- - if it crashes, the other servers are inaccessible.
      -
    6. - -
    7. - Dedicated Load Balancers - -

      There are more sophisticated solutions, as well. Cisco, - F5, and several other companies sell hardware load - balancers (typically used in pairs for redundancy), which - offer sophisticated load balancing and auto-failover - features. There are software packages which offer similar - features on commodity hardware, as well. If you have - enough money or need, check these out. The lb-l mailing list is a - good place to research.

      -
    8. -
    -
    -
    - -
    top
    -
    -

    New MIME-type, New Service

    - - - -
    -
    Description:
    - -
    -

    On the net there are many nifty CGI programs. But - their usage is usually boring, so a lot of webmasters - don't use them. Even Apache's Action handler feature for - MIME-types is only appropriate when the CGI programs - don't need special URLs (actually PATH_INFO - and QUERY_STRINGS) as their input. First, - let us configure a new file type with extension - .scgi (for secure CGI) which will be processed - by the popular cgiwrap program. The problem - here is that for instance if we use a Homogeneous URL Layout - (see above) a file inside the user homedirs might have a URL - like /u/user/foo/bar.scgi, but - cgiwrap needs URLs in the form - /~user/foo/bar.scgi/. The following rule - solves the problem:

    - -
    -RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
    -... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3  [NS,T=application/x-http-cgi]
    -
    - -

    Or assume we have some more nifty programs: - wwwlog (which displays the - access.log for a URL subtree) and - wwwidx (which runs Glimpse on a URL - subtree). We have to provide the URL area to these - programs so they know which area they are really working with. - But usually this is complicated, because they may still be - requested by the alternate URL form, i.e., typically we would - run the swwidx program from within - /u/user/foo/ via hyperlink to

    - -
    -/internal/cgi/user/swwidx?i=/u/user/foo/
    -
    - -

    which is ugly, because we have to hard-code - both the location of the area - and the location of the CGI inside the - hyperlink. When we have to reorganize, we spend a - lot of time changing the various hyperlinks.

    -
    - -
    Solution:
    - -
    -

    The solution here is to provide a special new URL format - which automatically leads to the proper CGI invocation. - We configure the following:

    - -
    -RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
    -RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
    -
    - -

    Now the hyperlink to search at - /u/user/foo/ reads only

    - -
    -HREF="*"
    -
    - -

    which internally gets automatically transformed to

    - -
    -/internal/cgi/user/wwwidx?i=/u/user/foo/
    -
    - -

    The same approach leads to an invocation for the - access log CGI program when the hyperlink - :log gets used.

    -
    -
    - -
    top
    -
    -

    On-the-fly Content-Regeneration

    - - - -
    -
    Description:
    - -
    -

    Here comes a really esoteric feature: Dynamically - generated but statically served pages, i.e., pages should be - delivered as pure static pages (read from the filesystem - and just passed through), but they have to be generated - dynamically by the web server if missing. This way you can - have CGI-generated pages which are statically served unless an - admin (or a cron job) removes the static contents. Then the - contents gets refreshed.

    -
    - -
    Solution:
    - -
    - This is done via the following ruleset: - -
    -# This example is valid in per-directory context only
    -RewriteCond %{REQUEST_FILENAME}   !-s
    -RewriteRule ^page\.html$          page.cgi   [T=application/x-httpd-cgi,L]
    -
    - -

    Here a request for page.html leads to an - internal run of a corresponding page.cgi if - page.html is missing or has filesize - null. The trick here is that page.cgi is a - CGI script which (additionally to its STDOUT) - writes its output to the file page.html. - Once it has completed, the server sends out - page.html. When the webmaster wants to force - a refresh of the contents, he just removes - page.html (typically from cron).

    -
    -
    - -
    top
    -
    -

    Document With Autorefresh

    - - - -
    -
    Description:
    - -
    -

    Wouldn't it be nice, while creating a complex web page, if - the web browser would automatically refresh the page every - time we save a new version from within our editor? - Impossible?

    -
    - -
    Solution:
    - -
    -

    No! We just combine the MIME multipart feature, the - web server NPH feature, and the URL manipulation power of - mod_rewrite. First, we establish a new - URL feature: Adding just :refresh to any - URL causes the 'page' to be refreshed every time it is - updated on the filesystem.

    - -
    -RewriteRule   ^(/[uge]/[^/]+/?.*):refresh  /internal/cgi/apache/nph-refresh?f=$1
    -
    - -

    Now when we reference the URL

    - -
    -/u/foo/bar/page.html:refresh
    -
    - -

    this leads to the internal invocation of the URL

    - -
    -/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
    -
    - -

    The only missing part is the NPH-CGI script. Although - one would usually say "left as an exercise to the reader" - ;-) I will provide this, too.

    - -
    -#!/sw/bin/perl
    -##
    -##  nph-refresh -- NPH/CGI script for auto refreshing pages
    -##  Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
    -##
    -$| = 1;
    -
    -#   split the QUERY_STRING variable
    -@pairs = split(/&/, $ENV{'QUERY_STRING'});
    -foreach $pair (@pairs) {
    -    ($name, $value) = split(/=/, $pair);
    -    $name =~ tr/A-Z/a-z/;
    -    $name = 'QS_' . $name;
    -    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
    -    eval "\$$name = \"$value\"";
    -}
    -$QS_s = 1 if ($QS_s eq '');
    -$QS_n = 3600 if ($QS_n eq '');
    -if ($QS_f eq '') {
    -    print "HTTP/1.0 200 OK\n";
    -    print "Content-type: text/html\n\n";
    -    print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
    -    exit(0);
    -}
    -if (! -f $QS_f) {
    -    print "HTTP/1.0 200 OK\n";
    -    print "Content-type: text/html\n\n";
    -    print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
    -    exit(0);
    -}
    -
    -sub print_http_headers_multipart_begin {
    -    print "HTTP/1.0 200 OK\n";
    -    $bound = "ThisRandomString12345";
    -    print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
    -    &print_http_headers_multipart_next;
    -}
    -
    -sub print_http_headers_multipart_next {
    -    print "\n--$bound\n";
    -}
    -
    -sub print_http_headers_multipart_end {
    -    print "\n--$bound--\n";
    -}
    -
    -sub displayhtml {
    -    local($buffer) = @_;
    -    $len = length($buffer);
    -    print "Content-type: text/html\n";
    -    print "Content-length: $len\n\n";
    -    print $buffer;
    -}
    -
    -sub readfile {
    -    local($file) = @_;
    -    local(*FP, $size, $buffer, $bytes);
    -    ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
    -    $size = sprintf("%d", $size);
    -    open(FP, "&lt;$file");
    -    $bytes = sysread(FP, $buffer, $size);
    -    close(FP);
    -    return $buffer;
    -}
    -
    -$buffer = &readfile($QS_f);
    -&print_http_headers_multipart_begin;
    -&displayhtml($buffer);
    -
    -sub mystat {
    -    local($file) = $_[0];
    -    local($time);
    -
    -    ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
    -    return $mtime;
    -}
    -
    -$mtimeL = &mystat($QS_f);
    -$mtime = $mtime;
    -for ($n = 0; $n &lt; $QS_n; $n++) {
    -    while (1) {
    -        $mtime = &mystat($QS_f);
    -        if ($mtime ne $mtimeL) {
    -            $mtimeL = $mtime;
    -            sleep(2);
    -            $buffer = &readfile($QS_f);
    -            &print_http_headers_multipart_next;
    -            &displayhtml($buffer);
    -            sleep(5);
    -            $mtimeL = &mystat($QS_f);
    -            last;
    -        }
    -        sleep($QS_s);
    -    }
    -}
    -
    -&print_http_headers_multipart_end;
    -
    -exit(0);
    -
    -##EOF##
    -
    -
    -
    - -
    top
    -
    -

    Mass Virtual Hosting

    - - - -
    -
    Description:
    - -
    -

    The <VirtualHost> feature of Apache is nice - and works great when you just have a few dozen - virtual hosts. But when you are an ISP and have hundreds of - virtual hosts, this feature is suboptimal.

    -
    - -
    Solution:
    - -
    -

    To provide this feature we map the remote web page or even - the complete remote web area to our namespace using the - Proxy Throughput feature (flag [P]):

    - -
    -##
    -##  vhost.map
    -##
    -www.vhost1.dom:80  /path/to/docroot/vhost1
    -www.vhost2.dom:80  /path/to/docroot/vhost2
    -     :
    -www.vhostN.dom:80  /path/to/docroot/vhostN
    -
    - -
    -##
    -##  httpd.conf
    -##
    -    :
    -#   use the canonical hostname on redirects, etc.
    -UseCanonicalName on
    -
    -    :
    -#   add the virtual host in front of the CLF-format
    -CustomLog  /path/to/access_log  "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
    -    :
    -
    -#   enable the rewriting engine in the main server
    -RewriteEngine on
    -
    -#   define two maps: one for fixing the URL and one which defines
    -#   the available virtual hosts with their corresponding
    -#   DocumentRoot.
    -RewriteMap    lowercase    int:tolower
    -RewriteMap    vhost        txt:/path/to/vhost.map
    -
    -#   Now do the actual virtual host mapping
    -#   via a huge and complicated single rule:
    -#
    -#   1. make sure we don't map for common locations
    -RewriteCond   %{REQUEST_URI}  !^/commonurl1/.*
    -RewriteCond   %{REQUEST_URI}  !^/commonurl2/.*
    -    :
    -RewriteCond   %{REQUEST_URI}  !^/commonurlN/.*
    -#
    -#   2. make sure we have a Host header, because
    -#      currently our approach only supports
    -#      virtual hosting through this header
    -RewriteCond   %{HTTP_HOST}  !^$
    -#
    -#   3. lowercase the hostname
    -RewriteCond   ${lowercase:%{HTTP_HOST}|NONE}  ^(.+)$
    -#
    -#   4. lookup this hostname in vhost.map and
    -#      remember it only when it is a path
    -#      (and not "NONE" from above)
    -RewriteCond   ${vhost:%1}  ^(/.*)$
    -#
    -#   5. finally we can map the URL to its docroot location
    -#      and remember the virtual host for logging purposes
    -RewriteRule   ^/(.*)$   %1/$1  [E=VHOST:${lowercase:%{HTTP_HOST}}]
    -    :
    -
    -
    -
    - -
    top
    -
    -

    Host Deny

    - - - -
    -
    Description:
    - -
    -

    How can we forbid a list of externally configured hosts - from using our server?

    -
    - -
    Solution:
    - -
    -

    For Apache >= 1.3b6:

    - -
    -RewriteEngine on
    -RewriteMap    hosts-deny  txt:/path/to/hosts.deny
    -RewriteCond   ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
    -RewriteCond   ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
    -RewriteRule   ^/.*  -  [F]
    -
    - -

    For Apache <= 1.3b6:

    - -
    -RewriteEngine on
    -RewriteMap    hosts-deny  txt:/path/to/hosts.deny
    -RewriteRule   ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
    -RewriteRule   !^NOT-FOUND/.* - [F]
    -RewriteRule   ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
    -RewriteRule   !^NOT-FOUND/.* - [F]
    -RewriteRule   ^NOT-FOUND/(.*)$ /$1
    -
    - -
    -##
    -##  hosts.deny
    -##
    -##  ATTENTION! This is a map, not a list, even when we treat it as such.
    -##             mod_rewrite parses it for key/value pairs, so at least a
    -##             dummy value "-" must be present for each entry.
    -##
    -
    -193.102.180.41 -
    -bsdti1.sdm.de  -
    -192.76.162.40  -
    -
    -
    -
    - -
    top
    -
    -

    Proxy Deny

    - - - -
    -
    Description:
    - -
    -

    How can we forbid a certain host or even a user of a - special host from using the Apache proxy?

    -
    - -
    Solution:
    - -
    -

    We first have to make sure mod_rewrite - is below(!) mod_proxy in the Configuration - file when compiling the Apache web server. This way it gets - called before mod_proxy. Then we - configure the following for a host-dependent deny...

    - -
    -RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
    -RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
    -
    - -

    ...and this one for a user@host-dependent deny:

    - -
    -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
    -RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
    -
    -
    -
    - -
    top
    -
    -

    Special Authentication Variant

    - - - -
    -
    Description:
    - -
    -

    Sometimes very special authentication is needed, for - instance authentication which checks for a set of - explicitly configured users. Only these should receive - access and without explicit prompting (which would occur - when using Basic Auth via mod_auth_basic).

    -
    - -
    Solution:
    - -
    -

    We use a list of rewrite conditions to exclude all except - our friends:

    - -
    -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$
    -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2@client2.quux-corp\.com$
    -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3@client3.quux-corp\.com$
    -RewriteRule ^/~quux/only-for-friends/      -                                 [F]
    -
    -
    -
    - -
    top
    -
    -

    Referer-based Deflector

    - - - -
    -
    Description:
    - -
    -

    How can we program a flexible URL Deflector which acts - on the "Referer" HTTP header and can be configured with as - many referring pages as we like?

    -
    - -
    Solution:
    - -
    -

    Use the following really tricky ruleset...

    - -
    -RewriteMap  deflector txt:/path/to/deflector.map
    -
    -RewriteCond %{HTTP_REFERER} !=""
    -RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
    -RewriteRule ^.* %{HTTP_REFERER} [R,L]
    -
    -RewriteCond %{HTTP_REFERER} !=""
    -RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
    -RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
    -
    - -

    ... in conjunction with a corresponding rewrite - map:

    - -
    -##
    -##  deflector.map
    -##
    -
    -http://www.badguys.com/bad/index.html    -
    -http://www.badguys.com/bad/index2.html   -
    -http://www.badguys.com/bad/index3.html   http://somewhere.com/
    -
    - -

    This automatically redirects the request back to the - referring page (when "-" is used as the value - in the map) or to a specific URL (when an URL is specified - in the map as the second argument).

    -
    -
    - -
+

Available Languages:  en  |  fr