incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <>
Subject Re: slash escaping (was 0.9.0 Release)
Date Sun, 14 Dec 2008 06:18:46 GMT
I have to say I always kind of assumed that most filesystems only
allowed Latin based characters in the name. I got interested so I
asked the guys in the IRC channel about non-latin characters in
filenames and someone actually just created a file on ext3 with
japanese characters and everythign worked fine.

Someone pasted this link:

Reading the table it appears that the biggest concern about filenames
is including a NULL byte.

Perhaps we're overthinking this whole thing? Maybe we can just write
filenames with weird characters and the sysadmin's have to muck around
with what happens when they have a design doc with weird characters?


On Sun, Dec 14, 2008 at 12:07 AM, Antony Blakey <> wrote:
> On 14/12/2008, at 2:47 PM, Chris Anderson wrote:
>> Perhaps your filename scheme could be appended to a slug (based on the
>> safe-chars) so that sysadmins could still use meaningful file globs to
>> eg batch rsync .couch files and view directories.
> The filename encoder can use any scheme, so yes that is trivial. It would
> only be (theoretical) a prefix of the readable chars because of length
> constraints. Note that there is no guarantee that slugs would be unique. I
> considered punycode, but given that it needs to deal with case-insensitive
> FS, slashes, limited length, it was simplest to cut to the chase and just
> use the digest.
> Regarding your request however, a better way to determine safe-chars
> according to the underlying filesystem is required IMO to avoid the overt
> roman script-only design. If you think it's essential that *you* can read
> the filenames in a terminal, then surely it's essential that a
> chinese/russian/greek/swedish/thai etc developer has the same facility.
> Otherwise it's not a *design requirement* per se, but rather a preference.
> I'm a pure english speaker myself, but I am about to deploy a couch system
> to an asian (government) environment with many millions of users (with, BTW,
> a link to CouchDB on every page). In the future I will have to sell this
> technology and do technology transfer to local developers - and that is made
> very much more difficult with the current vigorously asserted english-only
> design decisions because it's a significant political liability.
>> Readability / globbableness is also nice when you're trying to figure
>> out which views use the most space on the filesystem, a common task.
> That's why the actual name is in the 'name' file.
> Antony Blakey
> -------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
> There are two ways of constructing a software design: One way is to make it
> so simple that there are obviously no deficiencies, and the other way is to
> make it so complicated that there are no obvious deficiencies.
>  -- C. A. R. Hoare

View raw message