httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dr. Peter Poeml" <>
Subject [PATCH] mod_autoindex & character set
Date Wed, 31 Jan 2007 20:45:12 GMT

Users have a problem with directory listings generated by mod_autoindex:
It is not possible to control the character setting which which the
response is marked. The server cannot know what the real encoding on
disk is, it decides on a very rough guess based on the OS it is running
on: APR_HAS_UNICODE_FS, which is, as far (as little) as I looked, 1 on
Windows, and 0 on Linux. Depending on it, mod_autoindex decides whether
to add a (fixed) charset to the content type:

    ap_set_content_type(r, "text/html;charset=utf-8");
    ap_set_content_type(r, "text/html");

Thing is, that Linux uses filesystems that encode UTF-8 since ages, and
since a system-wide UTF-8 locale is becoming more and more widespread,
filenames encoded as such are occurring much more frequently. This
means, that on many servers the content type needs to be set
appropriately, so the browser can display things correctly.

My first thought was to define APR_HAS_UNICODE_FS to 1, but that could
be just as wrong; it only means that the filesystem is unicode capable
but not that the actual filenames happen to be encoded like that.
Instead, it only depends on site specific needs. 

Thus, I think the right way is to make the character set configurable. 
I am attaching a patch which adds a "AddDirectoryIndexCharset" directive
to the mod_autoindex configuration.

The patch actually removes the dependency on APR_HAS_UNICODE_FS. My
train of thought here is that utf-8 can (and should) be the default,
unless configured otherwise. This fits Windows (it has always been like
that), and it (largely) fits Linux. But I don't know about other

So, things remaining to discuss:
 - check the code for correctness (warning: I'm a beginner)
 - how to do it in a backwards compatible way
 - check if it is appropriate on all possible platforms
 - update the documentation

The latter thing is what I can easily do :) But on the other issues I
need help.

As a totally optional addition, it might be possible to let
mod_autoindex figure out the actual encoding, and automatically set an
appropriate character set. There are some more details in .

SUSE LINUX Products GmbH               Bug, bogey, bugbear, bugaboo:
Research & Development               A malevolent monster (not true?);
                                          Some mischief microbic;
                                         What makes someone phobic;
                                     The work one does not want to do.
  From: Chris Young (The Omnificent English Dictionary In Limerick Form)

View raw message