httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk-Willem van Gulik <>
Subject Re: Unicode and double byte character sets in Apache
Date Thu, 02 Sep 1999 21:39:38 GMT

On Thu, 2 Sep 1999, Bill Stoddard wrote:

> How do Apache users in Japan accomodate single byte Apache running on a
> double byte OS?  Are URIs, which use the 8859-1 charset, translated to
> filenames valid in a DBCS file system? Or are Japanese users forced into
> using single byte charsets when they are using Apache?  

Though not a japanese user; we've been tacking the URI's in languages like
chinese, japanese, tai and various arabic flavours by taking the file
names, represent glyphs in unicode (yes this is wrong, I know), fully
denormalizing them according to the spec forward tables; utf8 encoding the
result and then taking that as an octed string, %XX encoding where needed.

We've been making use of the fact there that unicode is strict 'read
order' sequential; i.e. left->r and r->left script is 'solved'. This is
also the URI we 'expose'. Though we try to be quite strict in what we put
in the pages; we had to be quite lesure in what to accept with the
browsers out there. The content/body of the documents was dealth with in a
much more complex way. Eff. using local charsets and unicode mixed.


View raw message