httpd-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Hartill <r...@imdb.com>
Subject howto.html
Date Sat, 07 Dec 1996 15:29:44 GMT

Here's a replacment for manual/misc/howto.html
Can someone please drop it into place. I'm not subscribed to this
list so reply directly to me if there's a problem.
thanks


<HTML>
<HEAD>
<META NAME="description" CONTENT="Some 'how to' tips for the Apache httpd server">
<META NAME="keywords" CONTENT="apache,redirect,robots,rotate,logfiles">
<TITLE>Apache HOWTO documentation</TITLE>
</HEAD>

<BODY>
<!--#include virtual="header.html" -->
<H1>Apache HOWTO documentation</H1>

How to:
<ul>
<li><A HREF="#redirect">redirect an entire server or directory</A>
<li><A HREF="#logreset">reset your log files</A>
<li><A HREF="#stoprob">stop/restrict robots</A>
</ul>

<HR>
<H2><A name="redirect">How to redirect an entire server or directory</A></H2>

<P>One way to redirect all requests for an entire server is to setup a
<CODE>Redirect</Code> to a <B>cgi script</B> which outputs a 301 or
302 status
and the location of the other server.</P>

<P>By using a <B>cgi-script</B> you can intercept various requests and treat
them
specially, e.g. you might want to intercept <B>POST</B> requests, so that the
client isn't redirected to a script on the other server which expects POST
information (a redirect will lose the POST information.)</P>

<P>Here's how to redirect all requests to a script... In the server
configuration file,
<blockquote><code>ScriptAlias /
/usr/local/httpd/cgi-bin/redirect_script</code></blockquote>
<BR>

and here's a simple perl script to redirect<BR>

<blockquote><code>
#!/usr/local/bin/perl <br>
<br>
print "Status: 302 Moved Temporarily\r <br>
Location: http://www.some.where.else.com/\r\n\r\n"; <br>
<br>
</code></blockquote></P>

<P>You can of course have a more sophisticated script that checks for
QUERY_STRING and sends that too <STRONG>if that's what you need.</STRONG></P>
<HR>

<H2><A name="logreset">How to reset your log files</A></H2>

<P>Sooner or later, you'll want to reset your log files (access_log and
error_log) because they are too big, or full of old information you don't
need.</P>

<P><CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.</P>

<P>Most people's first attempt at replacing the logfile is to just move the
logfile or remove the logfile. This doesn't work.</P>

<P>Apache will continue writing to the logfile at the same offset as before the
logfile moved. This results in a new logfile being created which is just
as big as the old one, but it now contains thousands (or millions) of null
characters.</P>

<P>The correct procedure is to move the logfile, then signal Apache to tell it to reopen
the logfiles.</P>

<P>Apache is signaled using the <B>SIGHUP</B> (-1) signal. e.g.
<blockquote><code>
mv access_log access_log.old<BR>
kill -1 `cat httpd.pid`
</code></blockquote>
</P>

<P>Note: <code>httpd.pid</code> is a file containing the <B>p</B>rocess
<B>id</B>
of the Apache httpd daemon, Apache saves this in the same directory as the log
files.</P>

<P>Many people use this method to replace (and backup) their logfiles on a
nightly or weekly basis.</P>
<HR>

<H2><A name="stoprob">How to stop or restrict robots</A></H2>

<P>Ever wondered why so many clients are interested in a file called
<code>robots.txt</code> which you don't have, and never did have?</P>

<P>These clients are called <B>robots</B> (also known as crawlers,
spiders and other cute name) - special automated clients which
wander around the web looking for interesting resources.</P>

<P>Most robots are used to generate some kind of <em>web index</em> which
is then used by a <em>search engine</em> to help locate information.</P>

<P><code>robots.txt</code> provides a means to request that robots limit
their
activities at the site, or more often than not, to leave the site alone.</P>

<P>When the first robots were developed, they had a bad reputation for sending hundreds/thousands
of requests to each site, often resulting in the site being overloaded. Things have improved
dramatically since then, thanks to <A HREF="http://info.webcrawler.com/mak/projects/robots/guidelines.html">
Guidelines for Robot Writers</A>, but even so, some robots may <A HREF="http://www.zyzzyva.com/robots/alert/">exhibit
unfriendly behavior</A> which the webmaster isn't willing to tolerate, and will want
to stop.</P>

<P>Another reason some webmasters want to block access to robots, is to
stop them indexing dynamic information. Many search engines will use the
data collected from your pages for months to come - not much use if your
serving stock quotes, news, weather reports or anything else that will be
stale by the time people find it in a search engine.</P>

<P>If you decide to exclude robots completely, or just limit the areas
in which they can roam, create a <CODE>robots.txt</CODE> file; refer
to the <A HREF="http://info.webcrawler.com/mak/projects/robots/robots.html">robot information
pages</A> provided by Martijn Koster for the syntax.</P>

<!--#include virtual="footer.html" -->
</BODY>
</HTML>

Mime
View raw message