This document supplements the
As of Apache HTTP server version 2.2
As
To get the most from this document, you should be familiar with + the basics of HTTP, and have read the Users' Guides to + Mapping URLs to the Filesystem and + Content negotiation.
+ +There are two main stages in
This means that any other stages with might ordinarily happen in the
+ process of serving a request, for example being handled by
+
If the URL is not found within the cache,
If the URL is found within the cache, but also found to have expired,
+ the filter is added anyway, but
When caching locally generated content, ensuring that
+ On can dramatically improve the ratio of cache hits. This
+ is because the hostname of the virtual-host serving the content forms
+ a part of the cache key. With the setting set to On
+ virtual-hosts with multiple server names or aliases will not produce
+ differently cached entities, and instead content will be cached as
+ per the canonical hostname.
Because caching is performed within the URL to filename translation + phase, cached documents will only be served in response to URL requests. + Ordinarily this is of little consequence, but there is one circumstance + in which it matters: If you are using Server + Side Includes;
+ ++<!-- The following include can be cached --> +<!--#include virtual="/footer.html" --> + +<!-- The following include can not be cached --> +<!--#include file="/path/to/footer.html" -->+
If you are using Server Side Includes, and want the benefit of speedy
+ serves from the cache, you should use virtual include
+ types.
The default expiry period for cached entities is one hour, however
+ this can be easily over-ridden by using the
If a response does not include an Expires header but does
+ include a Last-Modified header,
For local content,
The maximum expiry period may also be controlled by using the
+
When content expires from the cache and is re-requested from the + backend or content provider, rather than pass on the original request, + Aoache will use a conditional request instead.
+ +HTTP offers a number of headers which allow a client, or cache + to discern between different versions of the same content. For + example if a resource was served with an "Etag:" header, it is + possible to make a conditional request with an "If-Match:" + header. If a resource was served with a "Last-Modified:" header + it is possible to make a conditional request with an + "If-Modified-Since:" header, and so on.
+ +When such a conditional request is made, the response differs + depending on whether the content matches the conditions. If a request is + made with an "If-Modified-Since:" header, and the content has not been + modified since the time indicated in the request then a terse "304 Not + Modified" response is issued.
+ +If the content has changed, then it is served as if the request were + not conditional to begin with.
+ +The benefits of conditional requests in relation to caching are + twofold. Firstly, when making such a request to the backend, if the + content from the backend matches the content in the store, this can be + determined easily and without the overhead of transferring the entire + resource.
+ +Secondly, conditional requests are usually less strenuous on the
+ backend. For static files, typically all that is involved is a call
+ to stat() or similar system call, to see if the file has
+ changed in size or modification time. As such, even if Apache is
+ caching local content, even expired content may still be served faster
+ from the cache if it has not changed. As long as reading from the cache
+ store is faster than reading from the backend (e.g. an in-memory cache
+ compared to reading from disk).
As mentioned already, the two styles of caching in Apache work
+ differently,
In short, any content which is highly time-sensitive, or which varies + depending on the particulars of the request that are not covered by + HTTP negotiation, should not be cached.
+ +If you have dynamic content which changes depending on the IP address + of the requester, or changes every 5 minutes, it should almost certainly + not be cached.
+ +If on the other hand, the content served differs depending on the + values of various HTTP headers, it is possible that it might be possible + to cache it intelligently through the use of a "Vary" header.
+If a response with a "Vary" header is received by
+
If for example, a response is received with a vary header such as;
+ +As requests to end-users can be served from the cache, the cache + itself can become a target for those wishing to deface or interfere with + content. It is important to bear in mind that the cache must at all + times be writable by the user which Apache is running as. This is in + stark contrast to the usually recommended situation of maintaining + all content unwritable by the Apache user.
+ +If the Apache user is compromised, for example through a flaw in
+ a CGI process, it is possible that the cache may be targeted. When
+ using
This presents a somewhat elevated risk in comparison to the other
+ types of attack it is possible to make as the Apache user. If you are
+ using
When running Apache as a caching proxy server, there is also the + potential for so-called cache poisoning. Cache Poisoning is a broad + term for attacks in which an attacker causes the proxy server to + retrieve incorrect (and usually undesirable) content from the backend. +
+ +For example if the DNS servers used by your system running Apache + are vulnerable to DNS cache poisoning, an attacker may be able to control + where Apache connects to when requesting content from the origin server. + Another example is so-called HTTP request-smuggling attacks.
+ +This document is not the correct place for an in-depth discussion + of HTTP request smuggling (instead, try your favourite search engine) + however it is important to be aware that it is possible to make + a series of requests, and to exploit a vulnerability on an origin + webserver such that the attacker can entirely control the content + retrieved by the proxy.
+The act of opening a file can itself be a source of delay, particularly + on network filesystems. By maintaining a cache of open file descriptors + for commonly served files, Apache can avoid this delay. Currently Apache + provides two different implementations of File-Handle Caching.
+ +The most basic form of caching present in Apache is the file-handle
+ caching provided by
The
+
CacheFile /usr/local/apache2/htdocs/index.html+
If you intend to cache a large number of files in this manner, you + must ensure that your operating system's limit for the number of open + files is set appropriately.
+ +Although using
If the file is removed while Apache is running, Apache will continue + to maintain an open file descriptor and serve the file as it was when + Apache was started. This usually also means that although the file + will have been deleted, and not show up on the filesystem, extra free + space will not be recovered until Apache is stopped and the file + descriptor closed.
+CacheEnable fd /+
As with all of
Serving directly from system memory is universally the fastest method + of serving content. Reading files from a disk controller or, even worse, + from a remote network is orders of magnitude slower. Disk controllers + usually involve physical processes, and network access is limited by + your available bandwidth. Memory access on the other hand can take mere + nano-seconds.
+ +System memory isn't cheap though, byte for byte it's by far the most + expensive type of storage and it's important to ensure that it is used + efficiently. By caching files in memory you decrease the amount of + memory available on the system. As we'll see, in the case of operating + system caching, this is not so much of an issue, but when using + Apache's own in-memory caching it is important to make sure that you + do not allocate too much memory to a cache. Otherwise the system + will be forced to swap out memory, which will likely degrade + performance.
+ +Almost all modern operating systems cache file-data in memory managed + directly by the kernel. This is a powerful feature, and for the most + part operating systems get it right. For example, on Linux, let's look at + the difference in the time it takes to read a file for the first time + and the second time;
+ ++colm@coroebus:~$ time cat testfile > /dev/null +real 0m0.065s +user 0m0.000s +sys 0m0.001s +colm@coroebus:~$ time cat testfile > /dev/null +real 0m0.003s +user 0m0.003s +sys 0m0.000s+
Even for this small file, there is a huge difference in the amount + of time it takes to read the file. This is because the kernel has cached + the file contents in memory.
+ +By ensuring there is "spare" memory on your system, you can ensure + that more and more file-contents will be stored in this cache. This + can be a very efficient means of in-memory caching, and involves no + extra configuration of Apache at all.
+ +Additionally, because the operating system knows when files are + deleted or modified, it can automatically remove file contents from the + cache when neccessary. This is a big advantage over Apache's in-memory + caching which has no way of knowing when a file has changed.
+Despite the performance and advantages of automatic operating system + caching there are some circumstances in which in-memory caching may be + better performed by Apache.
+ +Firstly, an operating system can only cache files it knows about. If + you are running Apache as a proxy server, the files you are caching are + not locally stored but remotely served. If you still want the unbeatable + speed of in-memory caching, Apache's own memory caching is needed.
+ +MMapStatic /usr/local/apache2/htdocs/index.html+
As with the
+
The
Caching of this type is enabled via;
+ ++# Enable memory caching +CacheEnable mem / + +# Limit the size of the cache to 1 Megabyte +MCacheSize 1024+
Typically the module will be configured as so;
+ ++CacheRoot /var/cache/apache/ +CacheEnable disk / +CacheDirLevels 2 +CacheDirLength 1+
Importantly, as the cached files are locally stored, operating system + in-memory caching will typically be applied to their access also. So + although the files are stored on disk, if they are frequently accessed + it is likely the operating system will ensure that they are actually + served from memory.
+ +To store items in the cache,
Each character may be any one of 64-different characters, which mean
+ that overall there are 22^64 possible hashes. For example, a URL might
+ be hashed to xyTGxSMO2b68mBCykqkp1w. This hash is used
+ as a prefix for the naming of the files specific to that url within
+ the cache, however first it is split up into directories as per
+ the
/var/cache/apache/x/y/TGxSMO2b68mBCykqkp1w.
The overall aim of this technique is to reduce the number of
+ subdirectories or files that may be in a particular directory,
+ as most file-systems slow down as this number increases. With
+ setting of "1" for
+
Setting
+
Each url uses at least two files in the cache-store. Typically + there is a ".header" file, which includes meta-information about + the url, such as when it is due to expire and a ".data" file + which is a verbatim copy of the content to be served.
+ +In the case of a content negotiated via the "Vary" header, a + ".vary" directory will be created for the url in question. This + directory will have multiple ".data" files corresponding to the + differently negotiated content.
+Although
Instead, provided with Apache is the htcacheclean tool which, as the name + suggests, allows you to clean the cache periodically. Determining + how frequently to run htcacheclean and what target size to + use for the cache is somewhat complex and trial and error may be needed to + select optimal values.
+ +htcacheclean has two modes of + operation. It can be run as persistent daemon, or periodically from + cron. htcacheclean can take up to an hour + or more to process very large (tens of gigabytes) caches and if you are + running it from cron it is recommended that you determine how long a typical + run takes, to avoid running more than one instance at a time.
+ +
+ 
+ Figure 1: Typical
+ cache growth / clean sequence.
Because