apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilfredo Sánchez Vega <wsanc...@wsanchez.net>
Subject Re: apr_filepath_encoding on Darwin
Date Mon, 06 Aug 2007 23:10:00 GMT
   (Sorry for the lame reply latency.)

On Jul 18, 2007, at 5:24 PM, Roy T. Fielding wrote:

> A system less concerned with backwards compatibility is better off
> with a requirement of utf-8, though OS X should have made the filename
> encoding a mount option.

   I disagree.  Having one encoding is far superior to every  
application having to first find out what encoding the filesystem is  
question is using then using that.

   I see no value in having different mount points use different  
encodings.

> I assume that the ISO9660-Joliet (CD-ROM) driver does
> some form of filename translation automatically from UCS-2.

   The underlying volume format can use whatever it wants.  Ideally  
the format defines what that is.  Unfortunately, that's not the case,  
but for those that do, yes, converting to UTF-8 is the responsibility  
of the file system at the VFS layer.

   I suppose that a mount option to tell the filesystem that "this UFS  
volume uses encoding X" would be useful, but I maintain that above the  
kernel, you really want one encoding, not N.  Helping the kernel know  
what's underneath is certainly useful.

> In any case, even with the convention, it is left to the application
> to determine how it will treat encoded filenames.  The OS X decision
> to treat them all as utf-8 is at least consistent.  OTOH, this
> is just a display convention -- OS X apps should have been designed
> to treat the filename internally as an opaque nul-terminated array,
> rather than barfing on non-utf8 encodings.

   This is difficult in practice.  When the open panel sees a file  
that is not in UTF-8, there is no reliable way to display anything  
sane to the user.  I suppose a Linux nerd might say "show me some hex"  
or something, but most of our users are not Linux nerds.  I agree that  
crashing is worse than hex, though.

   Basically, on Mac OS X, you can, in fact, use whatever characters  
you like on UFS and BSD level software tends to cope with that. But if  
you aren't using UTF-8, then you aren't writing file name that are  
meant for user consumption.  ie. that may be OK for a database (eg.  
fsfs), though I think that even in that case you can reasonably stick  
to ASCII in many cases.

> One thing I miss in OS X is an automated way for file archivers
> (like unzip) to recognize and convert non-utf-8 filenames
> when they are unarchived.  I frequently have to do that by hand
> after unzipping something from China or Switzerland.

   Again, same as with volume formats, if the zip file format defines  
the encoding in zip files, then this should be easy (insofar as  
encodings are easy) for the software to deal with.

> Subversion
> breaks on OS X whenever someone commits a filename with an e-grave,
> which is a problem when your main product name is Communiqué.
> I wonder if this change in APR would fix that error?

   You still have to hope that the inbound encoding is correct (that  
is, that svn somehow knows it).  On OS X, that's easy; it's UTF-8.   
Once other operating systems come into the mix, it'll works as well as  
the encodings are defined (and known to svn) on those systems.

	-wsv


—
Wilfredo Sánchez - wsanchez@wsanchez.net


Mime
View raw message