Return-Path: owner-new-httpd Received: by taz.hyperreal.com (8.6.10/8.6.5) id KAA21434; Thu, 30 Mar 1995 10:49:15 -0800 Received: from life.ai.mit.edu by taz.hyperreal.com (8.6.10/8.6.5) with SMTP id KAA21422; Thu, 30 Mar 1995 10:49:12 -0800 Received: from volterra (volterra.ai.mit.edu) by life.ai.mit.edu (4.1/AI-4.10) for new-httpd@hyperreal.com id AA23147; Thu, 30 Mar 95 13:49:06 EST From: rst@ai.mit.edu (Robert S. Thau) Received: by volterra (4.1/AI-4.10) id AA13793; Thu, 30 Mar 95 13:49:04 EST Date: Thu, 30 Mar 95 13:49:04 EST Message-Id: <9503301849.AA13793@volterra> To: new-httpd@hyperreal.com Subject: Content negotiation --- preliminary docs. Sender: owner-new-httpd@hyperreal.com Precedence: bulk Reply-To: new-httpd@hyperreal.com Here's some stuff I threw together --- it's basically the existing few pages, HTMLized, and with a few i's dotted and t's crossed for readers who aren't protocol weenies. Randy, could you add something like this to what's on hyperreal? Thanks. rst Apache server Content arbitration: MultiViews and *.var files

Content Arbitration: MultiViews and *.var files

The HTTP standard allows clients (i.e., browsers like Mosaic or Netscape) to specify what data formats they are prepared to accept. The intention is that when information is available in multiple variants (e.g., in different data formats), servers can use this information to decide which variant to send. This feature has been supported in the CERN server for a while, and while it is not yet supported in the NCSA server, it is likely to assume a new importance in light of the emergence of HTML3 capable browsers.

Apache handles content negotiation in two different ways; special treatment for the pseudo-mime-type application/x-type-map, and the MultiViews per-directory Option (which can be set in srm.conf, or in .htaccess files, as usual). These features are alternate user interfaces to what amounts to the same piece of code (in the new file http_mime_db.c) which implements the content negotiation portion of the HTTP protocol.

Each of these features allows one of several files to satisfy a request, based on what the client says it's willing to accept; the differences are in the way the files are identified:

Apache also supports a new pseudo-MIME type, text/x-server-processed-html3, which is treated as text/html;level=3 for purposes of content negotiation, and as server-side-included HTML elsewhere.

Type maps (*.var files)

A type map is a document which is typed by the server (using its normal suffix-based mechanisms) as application/x-type-map. Note that to use this feature, you've got to have an AddType someplace which defines a file suffix as application/x-type-map; the easiest thing may be to stick a

  AddType application/x-type-map var

in srm.conf. See comments in the sample config files for details.

Type map files have an entry for each available variant; these entries consist of contiguous RFC822-format header lines. Entries for different variants are separated by blank lines. Blank lines are illegal within an entry. It is conventional to begin a map file with an entry for the combined entity as a whole, e.g.,


  URI: foo; vary="type,language"

  URI: foo.en.html
  Content-type: text/html; level=2
  Content-language: en

  URI: foo.fr.html
  Content-type: text/html; level=2
  Content-language: fr

If the variants have different qualities, that may be indicated by the "qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art):

  URI: foo; vary="type,language"

  URI: foo.jpeg
  Content-type: image/jpeg; qs=0.8

  URI: foo.gif
  Content-type: image/gif; qs=0.5

  URI: foo.txt
  Content-type: text/plain; qs=0.01

The full list of headers recognized is:

URI:
uri of the file containing the variant (of the given media type, encoded with the given content encoding). These are interpreted as URLs relative to the map file; they must be on the same server (!), and they must refer to files to which the client would be granted access if they were to be requsted directly.
Content-type:
media type --- level may be specified, along with "qs". These are often referred to as MIME types; typical media types are image/gif, text/plain, or text/html; level=3.
Content-language:
The language of the variant, specified as an internet standard language code (e.g., en for English, kr for Korean, etc.).
Content-encoding:
If the file is compressed, or otherwise encoded, rather than containing the actual raw data, this says how that was done. For compressed files (the only case where this generally comes up), content encoding should be x-compress, or gzip, as appropriate.
Content-length:
The size of the file. Clients can ask to receive a given media type only if the variant isn't too big; specifying a content length in the map allows the server to compare against these thresholds without checking the actual file.

Multiviews

This is a per-directory option, meaning it can be set with an Options directive within a <Directory> section in access.conf, or (if AllowOverride is properly set) in .htaccess files. Note that Options All does not set MultiViews; you have to ask for it by name. (Fixing this is a one-line change to httpd.h).

The effect of MultiViews is as follows: if the server receives a request for /some/dir/foo, if /some/dir has MultiViews enabled, and /some/dir/foo does *not* exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files, assigning them the same media types and content-encodings it would have if the client had asked for one of them by name. It then chooses the best match to the client's requirements, and forwards them along.

This applies to searches for the file named by the DirectoryIndex directive, if the server is trying to index a directory; if the configuration files specify


  DirectoryIndex index

then the server will arbitrate between index.html and index.html3 if both are present. If neither are present, and index.cgi is there, the server will run it.

If one of the files found by the globbing is a CGI script, it's not obvious what should happen. My code gives that case gets special treatment --- if the request was a POST, or a GET with QUERY_ARGS or PATH_INFO, the script is given an extremely high quality rating, and generally invoked; otherwise it is given an extremely low quality rating, which generally causes one of the other views (if any) to be retrieved. This is the only jiggering of quality ratings done by the MultiViews code; aside from that, all Qualities in the synthesized type maps are 1.0.

Note that this machinery only comes into play if the file which the user attempted to retrieve does not exist by that name; if it does, it is simply retrieved as usual. (So, someone who actually asks for foo.jpeg, as opposed to foo, never gets foo.gif).