Return-Path: Delivered-To: apmail-apr-dev-archive@apr.apache.org Received: (qmail 14437 invoked by uid 500); 9 Nov 2001 18:13:27 -0000 Mailing-List: contact dev-help@apr.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list dev@apr.apache.org Received: (qmail 14426 invoked from network); 9 Nov 2001 18:13:26 -0000 Date: Fri, 9 Nov 2001 13:13:35 -0500 (EST) From: Dirk-Willem van Gulik X-X-Sender: dirkx@titatovenaar.sfo.covalent.net To: dev@apr.apache.org Subject: UTF-8 support (fwd) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N We've got some of this in icconv/apr_xlate code. But it is far from complete. I've got some old code floating (google for C3 API for a rough idea) around which does -> utf 6|7|8 <-> unicode <-> specific_charset(languange) based on approximation code tables from the unicode standard. I.e. latin-1 '\xff' -> latin-3 'y' | latin-3 'ij' (depeding on language) '&' <-> 'et'; '\xdc' <-> 'u'/'eu'. '\xc6' <-> 'AE'. I.e. you can go from any charset or from unicode to any other charset - and if char's are not available we approxmiate it (occasionally based on language). I'd be quite happ to donate it - and work it in. However my feeling is that if we want to offer more than we do today it *will* require the unicode tables to be linked in or shipped. I.e. add half a megabyte to 2 megabyte to the footprint (depending on charset tables) for a version which covers about the same range of charsets as mac/windows does. I am not conviced that that is good. Dw