From user-return-13980-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Tue Nov 30 12:31:16 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 58306 invoked from network); 30 Nov 2010 12:31:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 12:31:16 -0000 Received: (qmail 2549 invoked by uid 500); 30 Nov 2010 12:31:14 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 2382 invoked by uid 500); 30 Nov 2010 12:31:14 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 2373 invoked by uid 99); 30 Nov 2010 12:31:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 12:31:13 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ziggythehamster@gmail.com designates 209.85.213.52 as permitted sender) Received: from [209.85.213.52] (HELO mail-yw0-f52.google.com) (209.85.213.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 12:31:07 +0000 Received: by ywf9 with SMTP id 9so2844135ywf.11 for ; Tue, 30 Nov 2010 04:30:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:x-mailer :mime-version:subject:date:references; bh=5LGQRDhEutwjnUDNakMacPnMEf+EJC22M9fCQBefVh8=; b=bpifgj+PiZtRT/P67rh5nv8Bq9EEKAIq2RoNDn4TWrNHbVBZ17c5g00J0oOVYQEM5x 1HsCLS38CqVZ3FZW6xBAZvLJm9h+au9S9y0SqoZZWLyw5idjSGjFCKBbZZFpWrlW1i4a fo8hBEoGm5HlSsFlz7K3cCtmUFk/sVDntjaso= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:x-mailer:mime-version:subject:date :references; b=TbzoKoPnoLiVKkUkuOwR1HLiui5rlkGldm/PcxiP+2wb7hmpw3wSMoJyiXxHl+Xksx 0cF0jsMJPdj0h+ltU2Vxfrm9Vaskcbkg0VevoILRfun0k4MctzPJSVwFp28bg85gWOqH 95DSuIt0TOdn0rZrCa8WLw5Pu6SfdtS+b+Zu0= Received: by 10.151.145.16 with SMTP id x16mr12301761ybn.258.1291120246194; Tue, 30 Nov 2010 04:30:46 -0800 (PST) Received: from [192.168.1.4] (ip68-0-73-133.tu.ok.cox.net [68.0.73.133]) by mx.google.com with ESMTPS id p30sm4004541ybk.8.2010.11.30.04.30.44 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 30 Nov 2010 04:30:45 -0800 (PST) Message-Id: <6E75AA63-1F72-4D10-A2D1-9042D77014E7@gmail.com> From: Keith Gable To: "user@couchdb.apache.org" In-Reply-To: <4CF4E802.8060906@zedeler.dk> Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Mailer: iPhone Mail (7D11) Mime-Version: 1.0 (iPhone Mail 7D11) Subject: Re: A possible approach for constructing trees of documents optimized for fast querying Date: Tue, 30 Nov 2010 06:30:39 -0600 References: <4CF4E802.8060906@zedeler.dk> This is kind of already on the wiki under "How to store hierarchical data". Reparenting massive amounts sucks, child creation doesn't. I use this method. Though I store IDs in my path array instead of integers. This makes it easy to find all items whose parent has ID x. Or the section of the tree with IDs x, y, z. On Nov 30, 2010, at 6:03 AM, Michael Zedeler wrote: > Hi everybody. > > I have come up with a way to create a tree where each node is a > document in CouchDB. The tree is ordered. > > Maybe someone on the list tried something similar and could comment > on the approach. For others, this could inspire a new way of using > CouchDB. > > The performance features are: > > * Fast reads. > * Slow writes (updates, deletions, moving subtrees). > > The basic notion is that every document gets a path attribute that > specifies where in the tree the document is located. > > The root path looks like this: > > [0] > > (JSON notation.) > > A child of the root looks like this: > > [0, 0] > > Adding another child to the root provides this path: > > [0, 32768] > > (32768 equals 2**15 and it is somewhat arbitraryly chosen.) > > The resulting documents could look like this: > > { '_id': 'root', 'path': [0] } > { '_id': 'child_1', 'path': [0, 0] } > { '_id': 'child_2', 'path': [0, 32768] } > > Adding a child to child_2 yields the path [0, 32768, 0]. > > Now querying the whole tree is done using > > startkey: [-2**16, 2**16] > endkey: [-2**16, 2**16, {}] > > Querying the subtree rooted in child_1: > > startkey: [0, 0] > endkey: [0, 0, {}] > > I have based this on the collation rules found here: > > http://wiki.apache.org/couchdb/View_collation#Complex_keys > > The constants -2**16 and 2**16 are just endpoints where you > sequentially subdivide the interval into smaller segments to make > room for more siblings. Thus inserting a node between child_1 and > child_2 will return in the path [0, 32768/2] == [0, 16384]. As far > as I understand it, JavaScript uses 64 bits to store numbers (they > are always floats), so there should be ample room for many siblings > under each node. At some point, it may be needed to re-balance the > tree, which will be an operation requiring an update of most nodes > in the subtree being balanced. > > Any comments? > > Regards, > > Michael. > > P. s. demonstration written in perl available on request. >