incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Storing Hierarchical Data
Date Mon, 16 Nov 2009 11:00:46 GMT

On 16 Nov 2009, at 05:28, Adam Wolff wrote:

> There isn't a great way to store hierarchical data in couch. If you want to
> actually move stuff around, the full pathname is a no-go, since there are no
> bulk updates. The only other trick here, if you have meaningful roots or
> branch points, is to store a reference to those in addition to the specific
> parent node in the graph.

It is not a no-go, renames just can't be atomic :)

Cheers
Jan
--

> 
> In any case, it seems better to me to store references from child to parent,
> rather than the other way around. The child document makes a more natural
> concurrency boundary.
> 
> 
> A
> 
> 
> On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis <
> paulogiann.couchdb@gmail.com> wrote:
> 
>> Greetings,
>> 
>> I recently started exploring the capabilities of couchdb and although I
>> find it really interesting and flexible, I am experiencing some
>> difficulties:
>> 
>> Is there any recommended way to store hierarchical data? Consider for
>> example the case of a file system with multiple directories. I can think of
>> some possible scenarios each with different capabilities and limitations:
>>  *   Each file and each folder is represented by a single document, with
>> each folder document containing a "contents" list that has the ids of the
>> subdocuments under the specific folder (the usual tree structure). In this
>> case, deleting a file would require updating more than one document (the
>> file for deletion and the parent folder for the "contents" attribute) which
>> seems dangerous considering the absence of transactional operations (what
>> about deleting a whole folder?). Moreover, accessing the file "foo/bar/cow"
>> would require a conventional pathname translation which adds overhead (cut
>> the pathname in chunks, request the "foo" folder, retrieve the ids of its
>> contents, find which one corresponds to the "bar" folder  etc..)
>>  *    Each file and each folder is represented by a single document, with
>> each file having an attribute "parent id" that contains the id of its parent
>> folder(reverse tree structure). In this case deleting the file requires only
>> one operation and seems more robust.   However pathname translation gets
>> fuzzy and seems to add a lot of overhead (retrieve id of folder, find
>> documents having this "parent id" attribute, find the one you want among
>> them...)
>>   *   Each file is represented by a single document that has a "path"
>> attribute that indicates the directory that is being stored to. This gives
>> the advantage of avoiding conventional pathname translation and retrieving
>> the correct document immediately. However, operations such as renaming a
>> folder require updating many documents and should be avoided.
>>  * Keep the whole file system in a single document. Ouch!
>> 
>> I am aware of the bulk update technique with the "all or nothing"
>> attribute, but it is to my understanding that it should be avoided,
>> especially when dealing with clustering and replication. In addition, things
>> seem to get more obscure when considering file sharing possibilities between
>> the users of the file system.
>> 
>> I would be glad if you could provide me some pointers on how to circumvent
>> the disadvantages of each of the methods above.
>> 
>> In general, do you thing that since dealing with documents is so flexible
>> and provided the absence of transactional operations one should try to
>> organize his data as decoupled as possible?
>> 
>> Thank you for your time ,
>> 
>> Andreas
>> 


Mime
View raw message