Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 45537 invoked from network); 17 Nov 2009 13:49:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Nov 2009 13:49:53 -0000 Received: (qmail 40721 invoked by uid 500); 17 Nov 2009 13:49:51 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 40659 invoked by uid 500); 17 Nov 2009 13:49:51 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 40649 invoked by uid 99); 17 Nov 2009 13:49:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2009 13:49:51 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of b.candler@pobox.com designates 208.72.237.25 as permitted sender) Received: from [208.72.237.25] (HELO sasl.smtp.pobox.com) (208.72.237.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2009 13:49:48 +0000 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id 3D37080B5D; Tue, 17 Nov 2009 08:49:25 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:from:to :cc:subject:message-id:references:mime-version:content-type :in-reply-to; s=sasl; bh=dWH3sC+MpY1wyr364ETIdCmYXCg=; b=yi7zCsG c9Y8Tcz6NMsFs36mNENYACYBzg6euIJBGfK6D3bTB8Z2Dj0+oTMWPL4yGbpACS5c tkfc01NBLmSNQWHOBCRSaeGnnb5/ndzN/WsX6wfGRSLi218/l6Cf7Pwd6J4Lt011 QwBrsSdtQHo/nzjhGooxT+pAKuZo+f631O6k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:from:to:cc :subject:message-id:references:mime-version:content-type :in-reply-to; q=dns; s=sasl; b=VwGkZXe3thML+qjkN1JbAQcN/5mCajFgt c+d2E2PicrTYjQVPcsNyGWVd/2dcCljKXEtDpxVRPwomRm2MVvsPXcpDfRntkb2i s7peGFHaD/L0ciVrGFZyWMZ6PG90cyNKE1GCDD93uUeJIrRkniD+svh0ET6D6f5K 8a35OBsVpg= Received: from a-pb-sasl-quonix. (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id 257A980B5C; Tue, 17 Nov 2009 08:49:24 -0500 (EST) Received: from mappit (unknown [80.45.95.114]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id A4CA580B5B; Tue, 17 Nov 2009 08:49:22 -0500 (EST) Received: from brian by mappit with local (Exim 4.69) (envelope-from ) id 1NAOQq-0003L4-G4; Tue, 17 Nov 2009 13:49:20 +0000 Date: Tue, 17 Nov 2009 13:49:20 +0000 From: Brian Candler To: Andreas Pavlogiannis Cc: user@couchdb.apache.org Subject: Re: Storing Hierarchical Data Message-ID: <20091117134920.GA12569@uk.tiscali.com> Mail-Followup-To: Andreas Pavlogiannis , user@couchdb.apache.org References: <4B00C38B.4040903@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B00C38B.4040903@gmail.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Pobox-Relay-ID: 03D45378-D380-11DE-B754-9F3FEE7EF46B-28021239!a-pb-sasl-quonix.pobox.com On Mon, Nov 16, 2009 at 05:14:19AM +0200, Andreas Pavlogiannis wrote: > * Each file is represented by a single document that has a "path" > attribute that indicates the directory that is being stored to. This > gives the advantage of avoiding conventional pathname translation and > retrieving the correct document immediately. This is the form I'd suggest. It has a number of retrieval benefits: * If you have a view which splits the path into an array of path components and emits that as the key in a view, then the key ordering is such that a parent is always immediately followed by its children. It is very cheap to retrieve a specific directory and all its descendants in one query using startkey and endkey. You can naturally serialise the data into a stream (e.g. as XML). * In another view, if you emit all path components except the last, then you have an index by immediate parent. This makes it very cheap to get all the children (first-level descendants) of a directory in a single query. Also, you may be able to work without explicit 'directory' objects at all, just the files themselves. > However, operations such as > renaming a folder require updating many documents True. > and should be avoided. Well, it's a more complex/expensive operation, but unless you expect this to be happening frequently it's probably OK. > I am aware of the bulk update technique with the "all or nothing" > attribute, but it is to my understanding that it should be avoided, > especially when dealing with clustering and replication. On the contrary. The "all or nothing" updates give you more or less the *same* semantics as you'd get with a multi-master clustered scenario. If you use this then your application will be simpler (as it never has to deal with HTTP 409 conflicts), and it will work the same with a single master or in a multi-master cluster. Unfortunately, the "all or nothing"-ness isn't carried across through replication. For example, if you want to reparent a directory, you might rename 100 documents using a single all-or-nothing bulk update. On the node where this is first applied, you are guaranteed atomicity. But after replication, you may find that some documents are renamed and some are not. Of course, eventual consistency means that barring some administrative block, the updates should eventually get through, but there could be a period of time where the files aren't all together. Regards, Brian.