Mailing-List: contact dev-help@apr.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 9 Nov 2001 13:13:35 -0500 (EST)
From: Dirk-Willem van Gulik <dirkx@covalent.net>
To: dev@apr.apache.org
Subject: UTF-8 support (fwd)
Message-ID: 
 <Pine.OSX.4.40.0111091257110.389-100000@titatovenaar.sfo.covalent.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


We've got some of this in icconv/apr_xlate code. But it is far
from  complete.

I've got some old code floating (google for C3 API for a rough idea)
around which does

->	utf 6|7|8 <-> unicode <-> specific_charset(languange)

based on approximation code tables from the unicode standard. I.e. latin-1
'\xff' -> latin-3 'y' | latin-3 'ij' (depeding on language) '&' <-> 'et';
'\xdc' <-> 'u'/'eu'. '\xc6' <-> 'AE'. I.e. you can go from any charset
or from unicode to any other charset - and if char's are not available we
approxmiate it (occasionally based on language).

I'd be quite happ to donate it - and work it in.

However my feeling is that if we want to offer more than we do today it
*will* require the unicode tables to be linked in or shipped.

I.e. add half a megabyte to 2 megabyte to the footprint (depending on
charset tables) for a version which covers about the same range of
charsets as mac/windows does.

I am not conviced that that is good.

Dw