httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <>
Subject Re: BUFF, IOL, Chunking, and Unicode in 2.0 (long)
Date Sun, 07 May 2000 19:20:14 GMT
> > > One thing not mentioned in the API is how this third layer knows enough
> > > about the data to do such conversion. At the least, if the input where
> > > UTF8 or unicode, it should know the destination charset, language and
> > > possibly mode of speach. In reality it might need the input
> > > charset+language and the destination charset+language.
> > > 
> > > Dw.
> > 
> > I guess you're talking about a combination of charset/language
> > negotiation?  I think that is the most interesting question, and I

Dw says

> !No! not at all, that is a separate problem. All I am saying is that as
> soon as you enter charset mappings you need auxilary info on both the
> language and the charset's to do the actual mapping. Example 'u\:', i.e.
> the 'u' with two dots above it; if you had to go from unicode to ascii
> then depending on the source langauge you'd make it a hard or a soft
> language and depending on the target you'd express that as a 'u', a 'ue'
> or a 'eu'.
> This is something the russian approach really has not dealth with.

Very cool...  Thanks for the explanation.

I think we need to add some sort of hints argument to
ap_xlate_open().  The language is one very useful hint to the
translation mechanism (if it can use it; we must ignore it when the
mechanism can't handle it).  Other possible hints:

. what to do if an input character is not in the character set we said
  it was (e.g., skip bytes in the input until we can translate again) 
. what to do if the input character can't be represented in the target
  character set (e.g., substitute the appropriate blank)
. whether or not to signal a permanent error if we get too many
  translation problems (what is "too many")
. whether or not the the app can handle non-SBCS translation

Whether or not any hints are implemented initially, we need to "fix"
the argument list so that they can be implemented in the future with
minimal impact to current users of the interface.

At the very least:

typedef struct ap_xlate_hints_t ap_xlate_hints_t;

ap_status_t ap_xlate_open(ap_xlate_t **convset, const char *topage, 
                       const char *frompage, ap_xlate_hints_t *hints,
		       ap_pool_t *pool);

I think it would be immediately useful as well as simple to initially
support a hint which states that the client can't handle non-SBCS.
There is too much of that sort of code lying around and such an app
shouldn't have to check the results of every translation call to find
out that in some cases (perhaps very infrequent) their SBCS
assumptions don't hold.

> > For EBCDIC-on-OS/390 (and hopefully a slightly wider audience :) ), I
> > think that the default encoding in the absence of configuration is the
> > character set associated with the current locale.  I would want to set
> > up some global variables at initialization; these would be handles to
> > translate headers (based on how the code is compiled) and a handle to
> > translate content to ASCII (based on the current locale).
> One advantage of the Locale is that it gives you 'out of bound' the
> charset, language, and possibly encoding; which solves half the problem.

Regarding locale... Right now there is a magic value to pass to
ap_xlate_open() for one of the character set names which means
"whatever character our source code is in."  I expect to add support
real-soon-now for a magic value which means "whatever character set is
associated with the current locale (run-time)."  When that is done, a
language hint could be picked up automatically, but that doesn't really
solve the general problem of needing to specify the language with the
names of the character sets.

> > If we simply have an AddCharset coded to tell us that a file is stored
> > in a certain charset, we still don't know what set of character sets
> > we should be willing to translate it into, right?
> Right.. and even worse, how (and if!) we do that translation depends not
> only on the charset it is stored in, but also the language of that stored
> document. Then the actual translation might need to know some 'target
> locale' information as well.
> > If you guys would just drop by the house some afternoon, we could
> > pretty quickly figure out how to separate the problem into parts that
> > different parties could tackle :)
> Right, how about breakfast ? :-)
> Dw

Any time is o.k., but breakfast time doesn't provide a sufficient
excuse (to me, at least) for consuming a brewski :)

Jeff Trawick | | PGP public key at web site:
          Born in Roswell... married an alien...

View raw message