httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <wr...@rowe-clan.net>
Subject Re: modifying MIME handling
Date Fri, 10 Aug 2001 00:26:43 GMT
From: "Paul Bayley" <bayleyp@mac.com>
Sent: Thursday, August 09, 2001 8:38 AM


> Hello,
> I would like to modify how Apache associates files with MIME Types, but I'm not exactly
clear how to implement it without gutting
mod_mime.c (which some Apache developers may not like). This document is longer than I would
have liked, but if I dived into code
examples without some background I wouldn't be too popular.

Please set your text to wrap at column 72 or some such :)

Second, I'm commenting from Apache 2.0.  I wouldn't expect the Apache group
to even consider projects of this scope for Apache 1.3.  Apache 2.0 offers
so much more versitility, it isn't even worth frustrating yourself over 1.3.

> Generally speaking my goal is to add support for mapping HFS+ file types to MIME types
in Apache running on Darwin 1.x. Currently
Apache only supports mapping filename extensions (in a case insensitive manner) to MIME types,
and also optionally using 'magic'
numbers which read the first few bytes of a file to determine it's type. Most OS X users would
probably find utility in this change.
I wish to implement this in a manner which:
>
> a) Is as flexible if not more so than the current filename extension behavior.
>
> b) Doesn't clutter Apache code with Darwin-specific code (and loads of #ifdef __APPLE__
macros, nobody wants that)
>
> c) Doesn't clutter Apache resource files
>
> d) Doesn't change the current mechanism significantly
>
> e) Doesn't give anybody a reason to disallow it being merged with Apache source.

First, mod_mime understands filename extensions (and mapping files).

mod_mime_magic understands file contents.

mod_mime_hfs could grok the additional indentification streams on the file :)

> I have studied the source and documentation off and on for a long time and have arrived
at a few stumbling blocks, mainly because
I don't know what changes are liable to make others disagreeable. For example there is no
effort anywhere to distinguish between
filename extensions and other forms of 'type declaration' in the source, config files, database,
or directives. It's just implied
that, for example, "AddType" means add a filename extension to MIME type mapping entry. I
have several thoughtful suggestions how I
can add support for another mapping type without causing confusion. As soon as somebody tells
me which is preferable, I will likely
seek advice how to implement it without gutting current source.
>
> Runtime database:
> To implement a new file mapping I can either:
>
> a) Create a new entry, disassociated with mod_mime's current entries. Thus a new entry
like forced_types_hfs could be "image/jpeg
'JPEG'".
>
> b) Create a new entry like (a), but also change the filename extension mapping entries
to be more specific like
forced_filename_extension_mapping_types or forced_types_ext.
>
> c) Use the current entries, but change the syntax so filename extensions and file types
are easily distinguished. Thus a
forced_types entry could be "image/jpeg jpeg jpg jpe 'JPEG'" where 'JPEG' means file type.
>
> The difficulty in implementation (explained later) is about the same.

I think mixing apples and oranges will cause headaches.  Have the user follow mod_mime_hfs,
then mod_mime (for extensions or file maps), then mod_mime_magic (the last resort.)

> Apache Directive Syntax:
> The current syntax merely implies filename extension mapping. Even the period prefix
which can be used to signify that an entry is
a filename extension is optional (like "Addtype image/gif .gif" is the same as "Addtype image/gif
gif"). To implement a new file
mapping directive syntax I can either:
>
> a) Create a new set of directives, similar to the current ones. For example "AddTypeHFS
image/gif 'GIF '"
>
> b) Do the same, and also change the filename extension syntax to me more specific. For
example "AddExtensionMimeMapping" or
"AddTypeExt"
>
> c) Use the same directive, but change the syntax to allow file type entries like "AddType
image/jpeg jpeg jpg jpe 'JPEG'"
>
> One possible problem is a file type isn't limited to 7bit ascii. If this is important
(please tell me) I can change the syntax to
'0xFFFFFFFF" or "'0xFFFFFFFF". I don't anticipate a problem with upper ascii characters since
they will be platform specific anyway
(and will have it's own separate config file).

On WinNT all configs are considered utf-8.  On Unix/Win9x the conf file is essentially
treated as the locally defined code page.

> The difficulty in implementation is about the same.
>
> Implementation in source code:
> There is no cost in fetching the file type so long you do it while fetching the filename
(where is this done?). If you try to
fetch a file type when it isn't set, or from a filesystem which doesn't support them it returns
null.

Please see apr_dir_read and apr_stat/lstat/getfileinfo.  The concept is this:

  . if the user asks for something, research it (no matter how painful, such as
    the file ownership or permissions on Win32.)

  . if an apr_finfo_t field comes for free, fill it in anyways :)

  . so on Unix, group/user ids are filled in, even if the user stats for APR_FINFO_MIN.
    Not for Win32, which will quintupple the time required for the stat :(

  . on Win32, nearly the entire apr_finfo_t is filled out for apr_dir_read, even when
    asked for APR_FINFO_DIR.

> Thus there is no reason not to always have this option on when running on Darwin. The
primary difference between filename
extensions and file types is Apache matches filename extensions in a case insensitive manner
while file types are any 32bit value
(or four char code).

This doesn't map terribly cleanly :(  Dunno what we aught to do about it.  AS400 and Win32
also have alternate streams (so do some Mac filesystems, no?) that may encode things like
the language or charset.  It would be nice to find some apr_ methods to extract this useful
data for Apache.



> It does this by converting filename extensions to lower case before entering them in
the runtime database, then converting the
filename extension in question to lower case, then strcmp(). Personally I don't understand
why strcasecmp() wasn't used instead,
perhaps because it was slightly faster.

Not if it's tested multiple times :(  Once, yes, but when you are comparing against a very
large table, and indexing via a hash, it's pointless.

> If this modification to mod_mime.c were done, I would also want to add a directive to
change which mapping had priority. Thus one
Darwin user may prefer filename extension mapping, another may prefer file type mapping, and
another may want filename extension
mapping in one directory only.

Understood, but this can't happen in mod_mime.

> If you want me to develop a separate module the priority issue will have to be addressed
because load order doesn't allow the user
to change the priority on a directory-specific basis.

So on a Darwin config, set mod_mime_hfs as the highest priority module.  It can have a fast
escape based on the <Directory > config (even dynamically detected) which drops the
file on
to mod_mime.  It can partially fill in the info, and leave other issues (language?) for
mod_mime or mod_mime magic.

The one problem I'd like to see addressed that mod_mime_* mechanisms respect the decisions
that are made before, and this becomes a run-all instead of a run-until-success.  With now
five different aspect to detect (we've got language, content-type, forced handlers, charset
and one I'm forgetting), we have to get off the idea that one mod_mime_any module will ever
answer all the questions :)

> If you have any suggestions or questions, please go right ahead. I'm partial to using
the existing directives with a slightly
modified syntax and a modified mod_mime. All existing config files would work as-is and the
source will probably be smaller. All I
really need to know is what method stands the greatest change of being submitted, where does
Apache read the filename, and how to
submit the changes.

Forget 1.3.  Read apr's file_io directory, compare Unix to Win32 and OS2, and start implementing
the Darwin specifics.

Then we can expand apr_finfo_t just a bit to allow some 'extended' info, that will be very
platform specific.  Add some accessors to ask the non-http kind of questions (what charset
is this file?  what content?)  And then implement a new mime module to get this metadata from
the filesystem, in a non-filesystem specific manner :)

Just my 2c

Bill


Mime
View raw message