incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <>
Subject Re: Storing Hierarchical Data
Date Mon, 16 Nov 2009 16:10:39 GMT
Ok, for some apps it's a no-go. If this is a
highly concurrent server app, you'll orphan data if you start two
rename updates at the same time,


On Monday, November 16, 2009, Jan Lehnardt <> wrote:
> On 16 Nov 2009, at 05:28, Adam Wolff wrote:
>> There isn't a great way to store hierarchical data in couch. If you want to
>> actually move stuff around, the full pathname is a no-go, since there are no
>> bulk updates. The only other trick here, if you have meaningful roots or
>> branch points, is to store a reference to those in addition to the specific
>> parent node in the graph.
> It is not a no-go, renames just can't be atomic :)
> Cheers
> Jan
> --
>> In any case, it seems better to me to store references from child to parent,
>> rather than the other way around. The child document makes a more natural
>> concurrency boundary.
>> A
>> On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis <
>>> wrote:
>>> Greetings,
>>> I recently started exploring the capabilities of couchdb and although I
>>> find it really interesting and flexible, I am experiencing some
>>> difficulties:
>>> Is there any recommended way to store hierarchical data? Consider for
>>> example the case of a file system with multiple directories. I can think of
>>> some possible scenarios each with different capabilities and limitations:
>>>  *   Each file and each folder is represented by a single document, with
>>> each folder document containing a "contents" list that has the ids of the
>>> subdocuments under the specific folder (the usual tree structure). In this
>>> case, deleting a file would require updating more than one document (the
>>> file for deletion and the parent folder for the "contents" attribute) which
>>> seems dangerous considering the absence of transactional operations (what
>>> about deleting a whole folder?). Moreover, accessing the file "foo/bar/cow"
>>> would require a conventional pathname translation which adds overhead (cut
>>> the pathname in chunks, request the "foo" folder, retrieve the ids of its
>>> contents, find which one corresponds to the "bar" folder  etc..)
>>>  *    Each file and each folder is represented by a single document, with
>>> each file having an attribute "parent id" that contains the id of its parent
>>> folder(reverse tree structure). In this case deleting the file requires only
>>> one operation and seems more robust.   However pathname translation gets
>>> fuzzy and seems to add a lot of overhead (retrieve id of folder, find
>>> documents having this "parent id" attribute, find the one you want among
>>> them...)
>>>   *   Each file is represented by a single document that has a "path"
>>> attribute that indicates the directory that is being stored to. This gives
>>> the advantage of avoiding conventional pathname translation and retrieving
>>> the correct document immediately. However, operations such as renaming a
>>> folder require updating many documents and should be avoided.
>>>  * Keep the whole file system in a single document. Ouch!
>>> I am aware of the bulk update technique with the "all or nothing"
>>> attribute, but it is to my understanding that it should be avoided,
>>> especially when dealing with clustering and replication. In addition, things
>>> seem to get more obscure when considering file sharing possibilities between
>>> the users of the file system.
>>> I would be glad if you could provide me some pointers on how to circumvent
>>> the disadvantages of each of the methods above.
>>> In general, do you thing that since dealing with documents is so flexible
>>> and provided the absence of transactional operations one should try to
>>> organize his data as decoupled as possible?
>>> Thank you for your time ,
>>> Andreas

View raw message