httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk-Willem van Gulik <di...@covalent.net>
Subject Re: BUFF, IOL, Chunking, and Unicode in 2.0 (long)
Date Sat, 06 May 2000 15:26:55 GMT


On Fri, 5 May 2000, Jeff Trawick wrote:

> >    Using (loadable?) translation tables based on unicode definitions
> >    is a very similar approach to what libiconv offers you (see
> >    http://clisp.cons.org/~haible/packages-libiconv.html -- though my
> >    inspiration came from the russian apache, and I only heard about
> >    libiconv recently). Every character set can be defined as a list
> >    of <hex code> <unicode equiv> pairs, and translations between
> >    several SBCS's can be collapsed into a single 256 char table.
> >    Efficiently building them once only, and finding them fast is an
> >    optimization task.

I've actually got a chunk of (perl) code which generates the C code to do
such. I am now waiting for the Unicode 3.0 standard to see how up to date
that code is; but wil most certainly want to advance that. It also does
UTF8 conversion and 'closests' approxmiation.

One thing not mentioned in the API is how this third layer knows enough
about the data to do such conversion. At the least, if the input where
UTF8 or unicode, it should know the destination charset, language and
possibly mode of speach. In reality it might need the input
charset+language and the destination charset+language.

Dw.


Mime
View raw message