couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: slash escaping (was 0.9.0 Release)
Date Sat, 13 Dec 2008 04:44:35 GMT

On 13/12/2008, at 7:40 AM, Damien Katz wrote:

> The decision to limit names db and design doc names is a pragmatic  
> one, it simplifies things greatly. CouchDB is full of things that  
> could be better. Patches welcome.

OK, this code now works for me in a client:

   require 'rubygems'
   require 'json'
   require 'couchrest'
   require 'cgi'

   db_name = CGI.escape("Ser vices/new - ∞शछغش갥걸ペボ")
   db = CouchRest.database!("http://localhost:5984/" + db_name)

   db_name = CGI.escape("ser vices/new - ∞शछغش갥걸ペボ")
   db = CouchRest.database!("http://localhost:5984/" + db_name)

And this URL in Safari:

   http://127.0.0.1:5984/Ser+vices%2Fnew+-+∞शछغش갥걸ペボ

returns this:

   {"db_name":"Ser vices/new - \u221e\u0936\u091b\u063a 
\u0634\uac25\uac78\u30da\u30dc","doc_count":1,"doc_del_count": 
0,"update_seq":1,"purge_seq":0,"compact_running":false,"disk_size": 
14365}

The filesystem looks like this:

   Ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA 
%B0%A5%EA%B1%B8%E3%83%9A%E3%83%9C-lhxj+E81IP9xm+0ssUSsQ==.couch
   ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA 
%B0%A5%EA%B1%B8%E3%83%9A%E3%83%9CN2JWdnNzkyqvutQ1OZeKUw==.couch

The Base64 (filename variant) of the MD5 is appended to deal with case  
sensitivity. I haven't investigated using platform-attribute-specific  
code, which would allow filenames to include Unicode characters if the  
OS supports that and therefore be much shorter. Presuming that files  
aren't intended to be portable between systems. Note that filenames  
for ascii names don't look nearly as ugly - not that I consider that  
to be a problem.

Dealing with view filenames can be done similarly.

However, I think a better solution is something like this:

   N2JWdnNzkyqvutQ1OZeKUw==.couchdb/
     name     # a UTF-8 file containing the name of the database
     data     # what was previously in the .couchdb file
     temp     # what was in the .*_temp file
     lhxj+E81IP9xm+0ssUSsQ==.viewgroup/
       name     # a UTF-8 file containing the name of the view
       data     # what was previously in the .view file

I suggest using the MD5 because it can be computed from the names.  
Alternatively they could be simple integers, which IMO would be a  
slightly better solution, but a more pervasive change because most of  
the functions currently take names. Using integers would avoid even  
the vanishingly small chance of collision.

I know the database name is in the data file, but all of the code  
requires the name before reading the file, and changing that would be  
a major patch. Furthermore, having the name accessible ensures that  
sysadmin tasks are still easy (and scriptable). I think this is a  
better system than the current one because filesystem containment is  
used rather than filename composition e.g. a database is entirely  
contained in a directory.

Apart from the 'name' files, this is a largely mechanistic change.

Opinions?

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is a  
faithful servant. We have created a society that honours the servant  
and has forgotten the gift.
   -- Albert Einstein



Mime
View raw message