Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 37623 invoked from network); 30 Nov 2010 12:03:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 12:03:49 -0000 Received: (qmail 58039 invoked by uid 500); 30 Nov 2010 12:03:47 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 57771 invoked by uid 500); 30 Nov 2010 12:03:47 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 57763 invoked by uid 99); 30 Nov 2010 12:03:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 12:03:46 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [178.209.166.152] (HELO caterpillar.leasingborsen.dk) (178.209.166.152) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 12:03:38 +0000 Received: by caterpillar.leasingborsen.dk (Postfix, from userid 1002) id DB9A229247; Tue, 30 Nov 2010 13:06:07 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on caterpillar.leasingborsen.dk X-Spam-Level: Received: from [10.14.75.50] (unknown [178.209.168.27]) by caterpillar.leasingborsen.dk (Postfix) with ESMTPA id 4FA642923F; Tue, 30 Nov 2010 13:06:07 +0100 (CET) Message-ID: <4CF4E802.8060906@zedeler.dk> Date: Tue, 30 Nov 2010 13:03:14 +0100 From: Michael Zedeler User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12pre) Gecko/20101026 Shredder/3.1.6pre MIME-Version: 1.0 To: user@couchdb.apache.org Subject: A possible approach for constructing trees of documents optimized for fast querying Content-Type: multipart/alternative; boundary="------------090402070600000807080203" X-Old-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL,HTML_MESSAGE autolearn=ham version=3.2.5 --------------090402070600000807080203 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi everybody. I have come up with a way to create a tree where each node is a document in CouchDB. The tree is ordered. Maybe someone on the list tried something similar and could comment on the approach. For others, this could inspire a new way of using CouchDB. The performance features are: * Fast reads. * Slow writes (updates, deletions, moving subtrees). The basic notion is that every document gets a path attribute that specifies where in the tree the document is located. The root path looks like this: [0] (JSON notation.) A child of the root looks like this: [0, 0] Adding another child to the root provides this path: [0, 32768] (32768 equals 2**15 and it is somewhat arbitraryly chosen.) The resulting documents could look like this: { '_id': 'root', 'path': [0] } { '_id': 'child_1', 'path': [0, 0] } { '_id': 'child_2', 'path': [0, 32768] } Adding a child to child_2 yields the path [0, 32768, 0]. Now querying the whole tree is done using startkey: [-2**16, 2**16] endkey: [-2**16, 2**16, {}] Querying the subtree rooted in child_1: startkey: [0, 0] endkey: [0, 0, {}] I have based this on the collation rules found here: http://wiki.apache.org/couchdb/View_collation#Complex_keys The constants -2**16 and 2**16 are just endpoints where you sequentially subdivide the interval into smaller segments to make room for more siblings. Thus inserting a node between child_1 and child_2 will return in the path [0, 32768/2] == [0, 16384]. As far as I understand it, JavaScript uses 64 bits to store numbers (they are always floats), so there should be ample room for many siblings under each node. At some point, it may be needed to re-balance the tree, which will be an operation requiring an update of most nodes in the subtree being balanced. Any comments? Regards, Michael. P. s. demonstration written in perl available on request. --------------090402070600000807080203--