Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of kevin.r.coombes@gmail.com
 designates 209.85.160.180 as permitted sender)
Message-ID: <4F023FF1.3070708@gmail.com>
Date: Mon, 02 Jan 2012 17:38:25 -0600
From: "Kevin R. Coombes" <kevin.r.coombes@gmail.com>
Organization: UT M.D. Anderson Cancer Center
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: user@couchdb.apache.org
CC: Jo-Erlend Schinstad <joerlend.schinstad@gmail.com>
Subject: Re: Modeling a tree in couchdb.
References: <4EF45C4A.5080902@gmail.com>
 <A9DC438D6CAFFF4BBB299581444408810120AA27A379@VS-EX-01.intra.vpro.nl>
 <CAP1T3-W4LTsGHAkOd7Mpb7Akr-iQ-6p=MrOAawuiXAQW6HVpNw@mail.gmail.com>
 <4F0164FE.7050207@gmail.com>
 <CAP1T3-V0PZQPZ57Ai8HnWMr9DhqcpxqAo8JvQzL+L38ABQPRmA@mail.gmail.com>
 <4F01E3A1.2080402@gmail.com> <4F021A1D.4010502@gmail.com>
In-Reply-To: <4F021A1D.4010502@gmail.com>
Content-Type: multipart/alternative;
 boundary="------------060006080009070902030407"

--------------060006080009070902030407
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Your last few posts have confused me as to the actual requirements.  
Suppose I have the following tree, where the numbers on the nodes are 
arbitrary IDs that i have assigned.
                      [1]
                       |
                  -----------
                  |         |
                 [2]       [3]
                  |         |
     -----------------   ---------
     |        |      |   |       |
    [4]      [X]    [5] [6]     [7]

Assume for the moment that the node labeled [X] does not yet exist.  
Then this tree would have something like the following JSON/Couch 
representation:
{ '_id': 1,  'ancestors': [] }
   { '_id': 2,  'ancestors': [1] }
   { '_id': 3,  'ancestors': [1] }
   { '_id': 4,  'ancestors': [2, 1] }
   { '_id': 5,  'ancestors': [2, 1] }
   { '_id': 6,  'ancestors': [3, 1] }
   { '_id': 7,  'ancestors': [3, 1] }

Each _id has been assigned manually rather than automatically.  The list 
of 'ancestors' for each item simply gives the ids that you trace through 
to march back up the tree from that node to the root.  If you want to 
add a new node at position X, then you decide which id to give it and 
simply create a new node that looks like
{ '_id': 'X',   'ancestors': [2, 1] }
Nothing else needs to change.  The hard part would only come if you 
wanted to, for example, insert a new node [Y] above node [3] in the 
tree. In that case, you would have to update all of the nodes in that 
branch, producing new items:
   { '_id': 'Y', 'ancestors': [1] }
   { '_id':  3,  'ancestors': ['Y', 1] }
   { '_id':  6,  'ancestors': [3, 'Y',  1] }
   { '_id':  7,  'ancestors': [3, 'Y', 1] }
Assuming fairly balanced branching, inserting a node above a branch at 
level N, you would expect to have to updated 1/2^N of the nodes, which 
should usually be a fairly small fraction.

Your answers suggest that you _also _want to keep track of the number of 
siblings of each node in some sequential order, which is a different 
requirement than keeping track of the hierarchical tree structure.  You 
can achieve this other goal by adding another item to the representation 
of each node. The simplest idea would be to add a 'children' property 
that lists (in implicit order) all of the children of a node.  In that 
case, for example, node [2] would originally be written as
   { '_id': 2,   'ancestors': [1],  'children': [4, 5] }
and would need to be updated to
   { '_id': 2,   'ancestors': [1], 'children;: [4, 'X', 5'] }
when you inserted the node [X].  Nothing else would have to change.

Of course, you might also want a complete list of all nodes in the order 
in which they were added to the tree. You could accomplish that goal by 
adding yet another property that allowed you to keep the items ordered, 
independent of the tree structure.

Other than replacing the root node by something above it, I cannot see a 
reason to ever have to update all of the nodes.  What am I missing?

     Kevin

On 1/2/2012 2:57 PM, Jo-Erlend Schinstad wrote:
> Den 02. jan. 2012 18:04, skrev Kevin R. Coombes:
>> Take this suggestion with a grain of salt, since I haven't actually 
>> tried it and am just making things up....
>>
>> If your structure is really a tree, then the location of every node 
>> is characterized by a unique path back to the root node, You could 
>> save the entire path in each object as a list:
>>     [parent, grandparent, great-grandparent, ..., rootnode]
>> One view would simply return this entire list for each object.
>> A second view would just give back the parent node.
>> Either view can be used (with appropriate logic in the client) to 
>> reconstruct the entire tree. However, you could also easily create 
>> auxiliary views (e.g., grandparent) depending on your needs.
>>
>> This organization makes querying the tree relatively easy.  However, 
>> it will have _terrible _performance if you do a lot of surgery on the 
>> tree, lopping off branches and reattaching them in different places.
>>
>
> That was my original thought. I  wanted to do something like key [0, 
> 5, 4,2] which would mean the sixth child of the first top-level node, 
> and the fifth child of that node and the third of it.
>
> This would work well, and I would get the tree in one go, organized 
> and nice. The problem is that the tree must be reorganizable. In that 
> model, if moved one node, then I would have to update all the 
> following documents. If there's a million rows in the tree, then I 
> would need something like 999.990+ http requests... :)
>
> Further, one of primary goals is replication. It could never work. The 
> internet janitor would hit me in the back of the head with a wrench. 
> However, for something like a family tree or a blog, where something 
> will forever be a response to something else, that would work nicely.
>
> So what I'm doing now, is that I'm just retrieving the data from the 
> database and organising it on the client. It's not a comfortable 
> solution, I think. It's not elegant, but if it's the only possible 
> solution, then it doesn't really matter if it's elegant or not. :)
>
> In other words, I'm still very much looking for a good way to model a 
> flexible tree in a couchdb. Any suggestions are highly welcome.
>
> Jo-Erlend Schinstad

--------------060006080009070902030407--