httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk-Willem van Gulik <>
Subject Re: BUFF, IOL, Chunking, and Unicode in 2.0 (long)
Date Sun, 07 May 2000 10:50:38 GMT

On Sun, 7 May 2000, Jeff Trawick wrote:

> > One thing not mentioned in the API is how this third layer knows enough
> > about the data to do such conversion. At the least, if the input where
> > UTF8 or unicode, it should know the destination charset, language and
> > possibly mode of speach. In reality it might need the input
> > charset+language and the destination charset+language.
> > 
> > Dw.
> I guess you're talking about a combination of charset/language
> negotiation?  I think that is the most interesting question, and I

!No! not at all, that is a separate problem. All I am saying is that as
soon as you enter charset mappings you need auxilary info on both the
language and the charset's to do the actual mapping. Example 'u\:', i.e.
the 'u' with two dots above it; if you had to go from unicode to ascii
then depending on the source langauge you'd make it a hard or a soft
language and depending on the target you'd express that as a 'u', a 'ue'
or a 'eu'.

This is something the russian approach really has not dealth with.

> For EBCDIC-on-OS/390 (and hopefully a slightly wider audience :) ), I
> think that the default encoding in the absence of configuration is the
> character set associated with the current locale.  I would want to set
> up some global variables at initialization; these would be handles to
> translate headers (based on how the code is compiled) and a handle to
> translate content to ASCII (based on the current locale).

One advantage of the Locale is that it gives you 'out of bound' the
charset, language, and possibly encoding; which solves half the problem.

> If we simply have an AddCharset coded to tell us that a file is stored
> in a certain charset, we still don't know what set of character sets
> we should be willing to translate it into, right?

Right.. and even worse, how (and if!) we do that translation depends not
only on the charset it is stored in, but also the language of that stored
document. Then the actual translation might need to know some 'target
locale' information as well.
> If you guys would just drop by the house some afternoon, we could
> pretty quickly figure out how to separate the problem into parts that
> different parties could tackle :)

Right, how about breakfast ? :-)


View raw message