incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <awo...@gmail.com>
Subject Re: Storing Hierarchical Data
Date Mon, 16 Nov 2009 04:28:00 GMT
There isn't a great way to store hierarchical data in couch. If you want to
actually move stuff around, the full pathname is a no-go, since there are no
bulk updates. The only other trick here, if you have meaningful roots or
branch points, is to store a reference to those in addition to the specific
parent node in the graph.

In any case, it seems better to me to store references from child to parent,
rather than the other way around. The child document makes a more natural
concurrency boundary.


A


On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis <
paulogiann.couchdb@gmail.com> wrote:

> Greetings,
>
> I recently started exploring the capabilities of couchdb and although I
> find it really interesting and flexible, I am experiencing some
> difficulties:
>
> Is there any recommended way to store hierarchical data? Consider for
> example the case of a file system with multiple directories. I can think of
> some possible scenarios each with different capabilities and limitations:
>   *   Each file and each folder is represented by a single document, with
> each folder document containing a "contents" list that has the ids of the
> subdocuments under the specific folder (the usual tree structure). In this
> case, deleting a file would require updating more than one document (the
> file for deletion and the parent folder for the "contents" attribute) which
> seems dangerous considering the absence of transactional operations (what
> about deleting a whole folder?). Moreover, accessing the file "foo/bar/cow"
> would require a conventional pathname translation which adds overhead (cut
> the pathname in chunks, request the "foo" folder, retrieve the ids of its
> contents, find which one corresponds to the "bar" folder  etc..)
>   *    Each file and each folder is represented by a single document, with
> each file having an attribute "parent id" that contains the id of its parent
> folder(reverse tree structure). In this case deleting the file requires only
> one operation and seems more robust.   However pathname translation gets
> fuzzy and seems to add a lot of overhead (retrieve id of folder, find
> documents having this "parent id" attribute, find the one you want among
> them...)
>    *   Each file is represented by a single document that has a "path"
> attribute that indicates the directory that is being stored to. This gives
> the advantage of avoiding conventional pathname translation and retrieving
> the correct document immediately. However, operations such as renaming a
> folder require updating many documents and should be avoided.
>   * Keep the whole file system in a single document. Ouch!
>
> I am aware of the bulk update technique with the "all or nothing"
> attribute, but it is to my understanding that it should be avoided,
> especially when dealing with clustering and replication. In addition, things
> seem to get more obscure when considering file sharing possibilities between
> the users of the file system.
>
> I would be glad if you could provide me some pointers on how to circumvent
> the disadvantages of each of the methods above.
>
> In general, do you thing that since dealing with documents is so flexible
> and provided the absence of transactional operations one should try to
> organize his data as decoupled as possible?
>
> Thank you for your time ,
>
> Andreas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message