couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: slash escaping (was 0.9.0 Release)
Date Sun, 14 Dec 2008 07:07:02 GMT

On 14/12/2008, at 4:48 PM, Paul Davis wrote:

> I have to say I always kind of assumed that most filesystems only
> allowed Latin based characters in the name. I got interested so I
> asked the guys in the IRC channel about non-latin characters in
> filenames and someone actually just created a file on ext3 with
> japanese characters and everythign worked fine.
>
> Someone pasted this link:
> http://en.wikipedia.org/wiki/Comparison_of_file_systems
>
> Reading the table it appears that the biggest concern about filenames
> is including a NULL byte.
>
> Perhaps we're overthinking this whole thing? Maybe we can just write
> filenames with weird characters and the sysadmin's have to muck around
> with what happens when they have a design doc with weird characters?

Filesystems may allow UTF-8, but they still assign meaning to some  
characters and/or sequences e.g. '/', '\', '..', sometimes ':', so you  
have to worry about it. Using a folding solution introduces collision  
possibilities, so you then have to deal with that - case insensitivity  
being the most egregious example of folding (and a personal hatred).  
Some characters are universally annoying to deal with at the command  
line. Leading '-' and spaces being obvious examples.

There is no solution that doesn't involve some decision and/or code.  
I'd support any solution that isn't ascii/english/roman-centric.  
However - not allowing '/' removes the principal hierarchy indicator,  
which is annoying. And placing a constraint on the document id of a  
design document seems wrong, because the '_design/' prefix is a  
filter, rather than a constraint, and IMO the rules covering document  
ids should be uniform so that one can treat all documents identically  
under transformation.

I like one directory per db, although an argument could be made that  
in the current scheme, one directory is canonical data, and the other  
is derived (effectively a cache).

Anyway, I could use my current solution and use a slug in which { non- 
printing, space, /, \, ., leading -, : } are removed or folded to e.g.  
_, suffixed with the hex MD5. How would this be? I could eliminate the  
MD5 if the slug is the same as the name under case folding, which  
would result in many filenames being identical to the name. It would  
still have a directory structure as per my previous email, and in  
particular the 'name' file would be remain, because it's needed to  
implement all_databases with transformed names, and it allows  
completely general scripting, albeit not as simple as filename globbing.

Given I've done the work to allow a full solution, and adding the slug  
isn't hard ... ?

Alternatively, a workable lexical constraint would be: printable  
unicode - { unicode uppercase, /, \, : } and !empty and !'..', but  
obviously I'm not keen on that.

BTW: the technical problem remaining with the solution I published is  
because Futon javascript collapses the design document id with the  
view name within that document and treats it as a compound entity -  
encoding/splitting etc is this more complicated. I've fixed some of  
that, but work remains.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is a  
faithful servant. We have created a society that honours the servant  
and has forgotten the gift.
   -- Albert Einstein



Mime
View raw message