apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilfredo Sánchez Vega <wsanc...@wsanchez.net>
Subject Re: apr_filepath_encoding on Darwin
Date Tue, 07 Aug 2007 00:39:40 GMT
On Aug 6, 2007, at 5:11 PM, Roy T. Fielding wrote:

> I agree.  But is it the case that non-native mounted filesystems
> are name-translated by the kernel?  I mean, if OS X did this  
> consistently
> for all mount points, then I would see it as being reasonable for the
> OS X applications to reject anything else.

   According to the tech note on this, if the encoding for the  
underlying volume format is known, it should be translated to UTF-8 at  
the VFS layer by the file system implementation:


> Actually, it also crashes on valid utf-8 in normal form, because OS X
> doesn't follow the standard on normalization.  See "man -s 5 utf8":
>     If more than a single representation of a value exists (for  
> example,
>     0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is  
> always
>     used.  Longer ones are detected as an error as they pose a  
> potential
>     security risk, and destroy the 1:1 character:octet sequence  
> mapping.
> but OS X requires the longer composition characters over shorter ones.
> My guess is that choice was driven by the way the UI allows such
> characters to be composed (like "alt-u u" for uumlaut).

   Above the VFS layer, we always use decomposed UTF-8.

> Of course, even with these issues, the Mac still kicks ass.

   Well, that's a given.

>>  Again, same as with volume formats, if the zip file format defines  
>> the encoding in zip files, then this should be easy (insofar as  
>> encodings are easy) for the software to deal with.
> Sadly, it doesn't (filenames are just null-terminated strings).  There
> are options for conversion from EBCDIC, but nothing to transcode the
> filenames in general as they are unzipped.  Maybe the zip command
> maintainer will take that as an enhancement request.

   Right, same with all archive formats.  You need to either define  
the name encoding as X or add some metadata to let you specify what  
encoding is in use (and, ideally, require that this be provided).

>>  You still have to hope that the inbound encoding is correct (that  
>> is, that svn somehow knows it).  On OS X, that's easy; it's UTF-8.   
>> Once other operating systems come into the mix, it'll works as well  
>> as the encodings are defined (and known to svn) on those systems.
> What I do currently is define
>   setenv  MM_CHARSET "utf-8"
>   setenv  LANG       "en_US.utf-8"
> in my shell init file.

   On Mac OS (at least), that isn't relevant with respect to  
filenames, which is what the patch that Erik proposed fixes.

   It is, however, relevant to how a CLI application encodes data sent  
to the terminal.  That is, the above means that Terminal.app expects  
to see UTF-8 English text.  (I think; again, I don't really know much  
about BSD locale settings.)


Wilfredo Sánchez - wsanchez@wsanchez.net

View raw message