incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CGS <cgsmcml...@gmail.com>
Subject Re: Best practice for storing large dynamic tree in CouchDB?
Date Wed, 04 Jan 2012 09:23:24 GMT
Hi,

1. Not a good approach for your case due to its speed in processing a 
request. Nevertheless, it's a solution.

2. It seems a good start, but there is still work to be done. In that 
example you have low multilevel tree (post and comments - most of the 
time 2-level tree), while in your case you have to think it as a dynamic 
deep multilevel tree (the worst case). The problems you have to think 
about are:
a) deleting a node requires deletion of the whole tree branch;
b) renaming a node requires update for all the documents within that 
tree branch.
My suggestion (at least something to start from) to avoid such problems 
would be to design your document as (by adding few more fields than in 
that example):

{
    _id: <first given name or an encoded name>,
    _rev: <whatever; not your direct concern>,
    status: <"active", "deleted" or "modified">,
    name: <modified name or name in human readable way>,
    parent: <parent ID>,
    permissions: <OS permissions for this node>,
    others: <other information>
}

Note: I prefer encoded name because at retrieval, some characters 
allowed by OS's may not be available (e.g., "+" in the _id will return 
garbage if you use cURL).

That means, every time you change a node (by deleting it or modifying 
its name), you don't need to change the whole branch, but only the 
status and the name for that node. E.g., in case of deleting a node, 
when you search for a sub-node, you can check all the time the status of 
the node and if it is flagged as deleted, it means your sub-node is 
deleted as well. This can help you to "recover" easier your erased 
nodes. As for the searching for a node which was renamed, you can easily 
put an if(doc.name == new_name) emit(doc.id,null).

This approach will be slower at high number of levels, as you can easily 
see, but pretty fast at current OS operations. A faster search approach 
would be to make a dictionary, but that would slow down 
insertion/deletion/modification (at least 2 documents to be modified 
instead of one, but that can be sped up by having the dictionary in 
another database) and it will also require a smart way to insert the 
dictionary (at thousands of files and directories, you may be needed to 
split your dictionary document in more pieces).

I hope this will give you at least an idea how to solve your problem.

CGS




On 01/04/2012 08:59 AM, Nicolas Raoul wrote:
> Hello,
>
> I want to store a tree in CouchDB.
> My app is a large filesystem in which folders/files can be moved/added/deleted.
>
> What is the best practice for this use case?
> Below are the approaches I have found on the Internet:
>
> 1) Wiki howto
> http://wiki.apache.org/couchdb/How_to_store_hierarchical_data
> Is this page really a howto? The redundancy is quite astonishing.
> Even worse, the author himself says in paragraph "Moving a node to
> another parent" that moving nodes is unreliable, and that he is "not
> sure of the best approach to avoid such a problem".
>
> 2) Link to parent
> Approach #2 at http://www.cmlenz.net/archives/2007/10/couchdb-joins
> Each node contains a reference to its parent.
> It seems good enough for the author's use case, but I am not sure it
> is scalable to mine.
>
> Both of these articles have been written by people who admittedly
> "have been playing with CouchDB lately".
> Could anybody provide some feedback on those approaches?
>
> Or is there another approach that could be described as a "best
> practice" for storing large dynamic tree in CouchDB?
>
> Thanks a lot!
> Nicolas Raoul
> http://nicolas-raoul.blogspot.com


Mime
View raw message