Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 34166 invoked from network); 11 Dec 2008 20:34:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Dec 2008 20:34:39 -0000 Received: (qmail 4168 invoked by uid 500); 11 Dec 2008 20:34:51 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 4132 invoked by uid 500); 11 Dec 2008 20:34:50 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 4121 invoked by uid 99); 11 Dec 2008 20:34:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2008 12:34:50 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of antony.blakey@gmail.com designates 209.85.142.186 as permitted sender) Received: from [209.85.142.186] (HELO ti-out-0910.google.com) (209.85.142.186) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2008 20:34:29 +0000 Received: by ti-out-0910.google.com with SMTP id a1so719067tib.3 for ; Thu, 11 Dec 2008 12:34:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=0L1b0FrnQg6KT4Jj6xNP6PmiyLCJwOvWUAhClecJPL4=; b=XlFG6fVFLfvBxMrXQmwMxrY7XbdMd/+NLnHxatFUnhSzLujGIvcPuWU0c2vqsml3/v wJ4LQH66m65/N4GzZ0EgyQLVlEjs1r0gDQxOVPnjjk36kqRFoJIuSn0NlT9+s5fLAyrb xu6mTwDVkXgyIhuY4fCHkFWWsiWoSe1qKmtZM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=KusRpOLVIFi9U8+N8K19PMw6PM2EFkF5HC1ZfaujII05jrC8VV0efQoAPJXUBpNesI 06ss24nzG4u9r92Hnoc87l/9CxHuz6l5JsTW7SX/wO6OuRQfyEmUp6OXGHMSQfT080Cz zFWMQIMU1oA+2tTwl8DIh6g+eClNsu6feVfr4= Received: by 10.110.53.19 with SMTP id b19mr4184185tia.38.1229027312202; Thu, 11 Dec 2008 12:28:32 -0800 (PST) Received: from ?192.168.0.16? (ppp121-45-41-103.lns10.adl2.internode.on.net [121.45.41.103]) by mx.google.com with ESMTPS id w12sm1916291tib.33.2008.12.11.12.28.29 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 11 Dec 2008 12:28:31 -0800 (PST) Message-Id: From: Antony Blakey To: dev@couchdb.apache.org In-Reply-To: <0EC6A3E0-15BA-4BBB-A0A3-9ED9D04E3C40@apache.org> Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: slash escaping (was 0.9.0 Release) Date: Fri, 12 Dec 2008 06:58:25 +1030 References: <0EC6A3E0-15BA-4BBB-A0A3-9ED9D04E3C40@apache.org> X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org On 12/12/2008, at 6:22 AM, Damien Katz wrote: > > On Dec 11, 2008, at 2:39 PM, Chris Anderson wrote: > >> On Wed, Dec 3, 2008 at 3:59 PM, Antony Blakey = > > wrote: >>> >>> On 04/12/2008, at 9:55 AM, Chris Anderson wrote: >>> >>>> On Wed, Dec 3, 2008 at 6:09 AM, Adam Kocoloski = >>> > >>>> wrote: >>>>> >>>>> 2) The "/" in the _design doc ID is confusing. >>>> >>>> Oh someone, please make it easy! (and correct) >>> >>> Someone please make it absolutely, 100%, correct. >>> >> >> The more I program against Couch, especially in a browser, the more I >> run into issues where different parts of the toolchain tend toward >> auto-unescaping %2F. It's hard to be certain that I've got something >> absolutely, 100% correct, but we'll never get there if we don't =20 >> start. >> >> Here are some examples which assume that docid's slashes will be >> urlencoded (unless the docid starts with '_'). This is the current >> rule (roughly). Each example has 2 urls with attachments that have no >> slashes in the name, followed by a url with an attachment with >> multiple slashes. I think it is feasible to allow this sort of thing >> to happen, by putting a little bit of special-case logic in the >> routing code. I don't think doing so breaks anything fundamental =20 >> about >> CouchDB. >> >> regular docs: >> >> /db/docid >> /db/docid/afile >> /db/docid/afile/with/nested/slashes >> >> design docs: >> >> /db/_design/name >> /db/_design/name/afile >> /db/_design/name/afile/with/nested/slashes >> >> >> If your docid does not start with '_' (eg not a local or design doc) >> then any slashes in the docid would have to be escaped. This is so we >> can know when attachment addressing begins. Also, design docs with >> slashes after the inital one (slashes in the name) would have to >> escape them. >> >> regular doc with slashes in id: >> >> /db/docid%2Fwith%2Fslashes >> /db/docid%2Fwith%2Fslashes/afile >> /db/docid%2Fwith%2Fslashes/afile/with/nested/slashes >> >> design doc with slashes in name: >> >> /db/_design/name%2Fwith%2Fslashes >> /db/_design/name%2Fwith%2Fslashes/afile >> /db/_design/name%2Fwith%2Fslashes/afile/with/nested/slashes >> >>> Special names, special paths, sometimes encoding, sometimes not. =20 >>> Such magic >>> is evil because it always comes back to bite your arse. >>> >> >> I think I may have this correct - eg non arse biting. But I'm posting >> to the dev list because y'all might see what I don't. >> >> I plan to put this into trunk before 1.0 (I think it will be =20 >> backwards >> compatible). Comments? >> >> Chris >> >> --=20 >> Chris Anderson >> http://jchris.mfdz.com > > > I agree with everything but slashes in design doc named. So the guidance is that users must not use document names starting =20 with '_' if they want to avoid astonishment? The other alternate is to always require the component after the db to =20= be 'special' i.e. document URLs could be /db/_/docid%2Fwith%2Fslashes/afile/with/nested/slashes No special rules required. IMO this example makes clear the cause of =20 the issue. > I think we probably shouldn't support design docs with slashes, and =20= > maybe all other weird characters. I think all document names should be Unicode. > For one thing, we use the design doc name as the file name for the =20 > view index file for the views. This is an issue that can prove =20 > problematic on certain platforms and not others. The file name can be escaped. There are also limitations on the length =20= of the filename depending on the platform. I suggest using an escaped =20= form of some initial segment of the name, concatenated with an escaped =20= form of some final segment of the name, concatenated wit a hash of the =20= full name. If the name is less than a certain length, then just escape the full =20 name. Also, provide a handler that returns a json document associating =20 filenames with the original name. This exposes the mapping =20 implementation in way that can be used by developers. Maybe also a =20 handler to map from an arbitrary string to a filename, using couch's =20 mapping function. Useful for plugin/_external authors who want to use =20= local files. IMO, limiting the names of things because of filesystem limitations is =20= a bad example of abstraction leakage. > If the design doc has weird characters that aren't supported in the =20= > file system, we can't make the index file. If we hash the filename, =20= > then it's impossible for an admin to figure out which files are =20 > which from the command line. So maybe we should url escape the name =20= > for the file system too. Or just not support weird characters at all. Antony Blakey -------------------------- CTO, Linkuistics Pty Ltd Ph: 0438 840 787 If at first you don=92t succeed, try, try again. Then quit. No use being = =20 a damn fool about it -- W.C. Fields