incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Pavlogiannis <>
Subject Storing Hierarchical Data
Date Mon, 16 Nov 2009 03:14:19 GMT

I recently started exploring the capabilities of couchdb and although I 
find it really interesting and flexible, I am experiencing some 

Is there any recommended way to store hierarchical data? Consider for 
example the case of a file system with multiple directories. I can think 
of some possible scenarios each with different capabilities and limitations:
    *   Each file and each folder is represented by a single document, 
with each folder document containing a "contents" list that has the ids 
of the subdocuments under the specific folder (the usual tree 
structure). In this case, deleting a file would require updating more 
than one document (the file for deletion and the parent folder for the 
"contents" attribute) which seems dangerous considering the absence of 
transactional operations (what about deleting a whole folder?). 
Moreover, accessing the file "foo/bar/cow" would require a conventional 
pathname translation which adds overhead (cut the pathname in chunks, 
request the "foo" folder, retrieve the ids of its contents, find which 
one corresponds to the "bar" folder  etc..)
    *    Each file and each folder is represented by a single document, 
with each file having an attribute "parent id" that contains the id of 
its parent folder(reverse tree structure). In this case deleting the 
file requires only one operation and seems more robust.   However 
pathname translation gets fuzzy and seems to add a lot of overhead 
(retrieve id of folder, find documents having this "parent id" 
attribute, find the one you want among them...)
     *   Each file is represented by a single document that has a "path" 
attribute that indicates the directory that is being stored to. This 
gives the advantage of avoiding conventional pathname translation and 
retrieving the correct document immediately. However, operations such as 
renaming a folder require updating many documents and should be avoided.
    * Keep the whole file system in a single document. Ouch!

I am aware of the bulk update technique with the "all or nothing" 
attribute, but it is to my understanding that it should be avoided, 
especially when dealing with clustering and replication. In addition, 
things seem to get more obscure when considering file sharing 
possibilities between the users of the file system.

I would be glad if you could provide me some pointers on how to 
circumvent the disadvantages of each of the methods above.

In general, do you thing that since dealing with documents is so 
flexible and provided the absence of transactional operations one should 
try to organize his data as decoupled as possible?

Thank you for your time ,


View raw message