Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 46762 invoked from network); 16 Nov 2009 11:01:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Nov 2009 11:01:21 -0000 Received: (qmail 17060 invoked by uid 500); 16 Nov 2009 11:01:20 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 16965 invoked by uid 500); 16 Nov 2009 11:01:20 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 16955 invoked by uid 99); 16 Nov 2009 11:01:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 11:01:20 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [83.97.50.139] (HELO jan.prima.de) (83.97.50.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 11:01:11 +0000 Received: from [10.0.2.3] (62-220-4-154.berlikomm.net [::ffff:62.220.4.154]) (AUTH: LOGIN jan, TLS: TLSv1/SSLv3,128bits,AES128-SHA) by jan.prima.de with esmtp; Mon, 16 Nov 2009 11:00:48 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1077) Subject: Re: Storing Hierarchical Data From: Jan Lehnardt In-Reply-To: Date: Mon, 16 Nov 2009 12:00:46 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4B00C38B.4040903@gmail.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1077) X-Virus-Checked: Checked by ClamAV on apache.org On 16 Nov 2009, at 05:28, Adam Wolff wrote: > There isn't a great way to store hierarchical data in couch. If you = want to > actually move stuff around, the full pathname is a no-go, since there = are no > bulk updates. The only other trick here, if you have meaningful roots = or > branch points, is to store a reference to those in addition to the = specific > parent node in the graph. It is not a no-go, renames just can't be atomic :) Cheers Jan -- >=20 > In any case, it seems better to me to store references from child to = parent, > rather than the other way around. The child document makes a more = natural > concurrency boundary. >=20 >=20 > A >=20 >=20 > On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis < > paulogiann.couchdb@gmail.com> wrote: >=20 >> Greetings, >>=20 >> I recently started exploring the capabilities of couchdb and although = I >> find it really interesting and flexible, I am experiencing some >> difficulties: >>=20 >> Is there any recommended way to store hierarchical data? Consider for >> example the case of a file system with multiple directories. I can = think of >> some possible scenarios each with different capabilities and = limitations: >> * Each file and each folder is represented by a single document, = with >> each folder document containing a "contents" list that has the ids of = the >> subdocuments under the specific folder (the usual tree structure). In = this >> case, deleting a file would require updating more than one document = (the >> file for deletion and the parent folder for the "contents" attribute) = which >> seems dangerous considering the absence of transactional operations = (what >> about deleting a whole folder?). Moreover, accessing the file = "foo/bar/cow" >> would require a conventional pathname translation which adds overhead = (cut >> the pathname in chunks, request the "foo" folder, retrieve the ids of = its >> contents, find which one corresponds to the "bar" folder etc..) >> * Each file and each folder is represented by a single document, = with >> each file having an attribute "parent id" that contains the id of its = parent >> folder(reverse tree structure). In this case deleting the file = requires only >> one operation and seems more robust. However pathname translation = gets >> fuzzy and seems to add a lot of overhead (retrieve id of folder, find >> documents having this "parent id" attribute, find the one you want = among >> them...) >> * Each file is represented by a single document that has a "path" >> attribute that indicates the directory that is being stored to. This = gives >> the advantage of avoiding conventional pathname translation and = retrieving >> the correct document immediately. However, operations such as = renaming a >> folder require updating many documents and should be avoided. >> * Keep the whole file system in a single document. Ouch! >>=20 >> I am aware of the bulk update technique with the "all or nothing" >> attribute, but it is to my understanding that it should be avoided, >> especially when dealing with clustering and replication. In addition, = things >> seem to get more obscure when considering file sharing possibilities = between >> the users of the file system. >>=20 >> I would be glad if you could provide me some pointers on how to = circumvent >> the disadvantages of each of the methods above. >>=20 >> In general, do you thing that since dealing with documents is so = flexible >> and provided the absence of transactional operations one should try = to >> organize his data as decoupled as possible? >>=20 >> Thank you for your time , >>=20 >> Andreas >>=20