Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 68476 invoked by uid 500); 27 Feb 2001 02:21:45 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 68465 invoked from network); 27 Feb 2001 02:21:45 -0000 Date: Mon, 26 Feb 2001 18:22:43 -0800 (PST) From: Sander van Zoest X-Sender: sander@escher.vanzoest.com To: new-httpd@apache.org Subject: Re: unicode file APIs (was: Re: canonical stuff) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N On Sun, 25 Feb 2001, dean gaudet wrote: > > The answer is to have apr_file_open_u() for opening with Unicode filenames, > > not changing the encoding of the existing apr_file_open. You completely > > break all possibility of writing portable apps when you do that. And APR is > > *about* writing portable apps. > i'm a bit of an I18N novice, but doesn't it all just magically work if you > use UTF-8 encoding everywhere? > > UTF-8 deliberately avoids using \0 and / in the encodings. plain ascii > works unmodified. unix filesystems generally support UTF-8 directly > (because of the \0 and / avoidance). > > this allows you to have a single API which understands unicode on all > platforms -- you don't need to have _u versions which take unicode > strings. > > give this page a perusal: http://www.cl.cam.ac.uk/~mgk25/unicode.html i18n can be kind of pain when you need to convert data that you do not know the charset for or is data you do not control. Going to a fully ISO-10646 (UTF-8) system would kill all the issues, but the problem is making that migration and converting everything. This is where there isn't too much code out there that does all the mappings. I do think, as wrowe points out, this probably should be handled inside APR, so this way apache can handle as much as possible in ISO-10646, especially if everything it interacts with supports it. Now the problem comes in when you deal with non 10646 stuff outside of the ASCII and latin1 charsets when you have a 10646 based server. You need to convert somehow and if we convert to UTF-8 via iconv then I do not see an issue. -- Sander van Zoest [sander@covalent.net] Covalent Technologies, Inc. http://www.covalent.net/ (415) 536-5218 http://www.vanzoest.com/sander/