Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1077)
Subject: Re: Storing Hierarchical Data
From: Jan Lehnardt <jan@apache.org>
In-Reply-To: <e8d26ac40911152028xde826bdiecd38651df6a06ce@mail.gmail.com>
Date: Mon, 16 Nov 2009 12:00:46 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <C978915F-798A-40CC-86AE-A33628C1F377@apache.org>
References: <4B00C38B.4040903@gmail.com>
 <e8d26ac40911152028xde826bdiecd38651df6a06ce@mail.gmail.com>
To: user@couchdb.apache.org


On 16 Nov 2009, at 05:28, Adam Wolff wrote:

> There isn't a great way to store hierarchical data in couch. If you =
want to
> actually move stuff around, the full pathname is a no-go, since there =
are no
> bulk updates. The only other trick here, if you have meaningful roots =
or
> branch points, is to store a reference to those in addition to the =
specific
> parent node in the graph.

It is not a no-go, renames just can't be atomic :)

Cheers
Jan
--

>=20
> In any case, it seems better to me to store references from child to =
parent,
> rather than the other way around. The child document makes a more =
natural
> concurrency boundary.
>=20
>=20
> A
>=20
>=20
> On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis <
> paulogiann.couchdb@gmail.com> wrote:
>=20
>> Greetings,
>>=20
>> I recently started exploring the capabilities of couchdb and although =
I
>> find it really interesting and flexible, I am experiencing some
>> difficulties:
>>=20
>> Is there any recommended way to store hierarchical data? Consider for
>> example the case of a file system with multiple directories. I can =
think of
>> some possible scenarios each with different capabilities and =
limitations:
>>  *   Each file and each folder is represented by a single document, =
with
>> each folder document containing a "contents" list that has the ids of =
the
>> subdocuments under the specific folder (the usual tree structure). In =
this
>> case, deleting a file would require updating more than one document =
(the
>> file for deletion and the parent folder for the "contents" attribute) =
which
>> seems dangerous considering the absence of transactional operations =
(what
>> about deleting a whole folder?). Moreover, accessing the file =
"foo/bar/cow"
>> would require a conventional pathname translation which adds overhead =
(cut
>> the pathname in chunks, request the "foo" folder, retrieve the ids of =
its
>> contents, find which one corresponds to the "bar" folder  etc..)
>>  *    Each file and each folder is represented by a single document, =
with
>> each file having an attribute "parent id" that contains the id of its =
parent
>> folder(reverse tree structure). In this case deleting the file =
requires only
>> one operation and seems more robust.   However pathname translation =
gets
>> fuzzy and seems to add a lot of overhead (retrieve id of folder, find
>> documents having this "parent id" attribute, find the one you want =
among
>> them...)
>>   *   Each file is represented by a single document that has a "path"
>> attribute that indicates the directory that is being stored to. This =
gives
>> the advantage of avoiding conventional pathname translation and =
retrieving
>> the correct document immediately. However, operations such as =
renaming a
>> folder require updating many documents and should be avoided.
>>  * Keep the whole file system in a single document. Ouch!
>>=20
>> I am aware of the bulk update technique with the "all or nothing"
>> attribute, but it is to my understanding that it should be avoided,
>> especially when dealing with clustering and replication. In addition, =
things
>> seem to get more obscure when considering file sharing possibilities =
between
>> the users of the file system.
>>=20
>> I would be glad if you could provide me some pointers on how to =
circumvent
>> the disadvantages of each of the methods above.
>>=20
>> In general, do you thing that since dealing with documents is so =
flexible
>> and provided the absence of transactional operations one should try =
to
>> organize his data as decoupled as possible?
>>=20
>> Thank you for your time ,
>>=20
>> Andreas
>>=20