incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Storing Hierarchical Data
Date Tue, 17 Nov 2009 13:49:20 GMT
On Mon, Nov 16, 2009 at 05:14:19AM +0200, Andreas Pavlogiannis wrote:
>     *   Each file is represented by a single document that has a "path"  
> attribute that indicates the directory that is being stored to. This  
> gives the advantage of avoiding conventional pathname translation and  
> retrieving the correct document immediately.

This is the form I'd suggest. It has a number of retrieval benefits:

* If you have a view which splits the path into an array of path components
and emits that as the key in a view, then the key ordering is such that a
parent is always immediately followed by its children. It is very cheap to
retrieve a specific directory and all its descendants in one query using
startkey and endkey. You can naturally serialise the data into a stream
(e.g. as XML).

* In another view, if you emit all path components except the last, then you
have an index by immediate parent. This makes it very cheap to get all the
children (first-level descendants) of a directory in a single query.

Also, you may be able to work without explicit 'directory' objects at all,
just the files themselves.

> However, operations such as  
> renaming a folder require updating many documents

True.

> and should be avoided.

Well, it's a more complex/expensive operation, but unless you expect this to
be happening frequently it's probably OK.

> I am aware of the bulk update technique with the "all or nothing"  
> attribute, but it is to my understanding that it should be avoided,  
> especially when dealing with clustering and replication.

On the contrary. The "all or nothing" updates give you more or less the
*same* semantics as you'd get with a multi-master clustered scenario. If you
use this then your application will be simpler (as it never has to deal with
HTTP 409 conflicts), and it will work the same with a single master or in
a multi-master cluster.

Unfortunately, the "all or nothing"-ness isn't carried across through
replication. For example, if you want to reparent a directory, you might
rename 100 documents using a single all-or-nothing bulk update. On the node
where this is first applied, you are guaranteed atomicity. But after
replication, you may find that some documents are renamed and some are not.
Of course, eventual consistency means that barring some administrative
block, the updates should eventually get through, but there could be a
period of time where the files aren't all together.

Regards,

Brian.

Mime
View raw message