incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Klo <jim....@sri.com>
Subject Re: Tree like structures in CouchDB
Date Sat, 23 Feb 2013 04:42:05 GMT
Michael, 

It's hard to gauge what's best - as it depends on your use case... Possibly provide more details?

Q: What are your navigation requirements? Drill up/down, jump to other trees, etc.

Q: Are you building or in need of a dynamic tree control? Do you want to lazy load the elements?

Q: Do you need to find specific subtrees? I've built some tree visualizations with hundreds
of thousands of nodes - the reality is you can't actually physically display anything human
readable with even a few thousand tree nodes. 

If you can start defining how you need to use the data - the best model for storage and retrieval
will weed itself out. 

FWIW, I'd like to understand Wendall's solution, as I don't understand how to recreate the
tree from his description. Sorry Wendall! :(. I swear I've seen you describe this on list
before, but couldn't find it in the archives.

I think both potentially are reasonable approaches, and both have pros and cons for usage.


- Jim

On Feb 22, 2013, at 4:29 PM, "Michael Heuberger" <michael.heuberger@binarykitchen.com>
wrote:

> Sorry guys, but now you left me a bit confused.
> 
> Again, what's the recommendation to implement tree structures for messages in CouchDB?
> 
> Cheers
> Michael
> 
> On 23/02/13 11:42, Jim Klo wrote:
>> 
>>> I wanted to give my feedback about what I've learned in this area.
>>> 
>>> First, I don't use the doc _id at all for sorting docs. It solves one single
use-case, but fails if you have others, so instead, I do this:
>>> 
>>> Every doc, whether the parent or child has identifying information. So a child
might contain the parent id, thread id, etc. Parent doesn't need to know about it's children
so it doesn't matter, as those can be pulled in a single view query.
>>> 
>>> Say I want to do something as originally stated, I'd create a view where I emit([parent_id,
next_level_id, next_level_id], null) with default values for the latter nested levels being
0 by default. When I query the view, I get back a result set that would look like the following.
>>> [
>>> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0, 0],"value":null}, <-
parent
>>> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1, 0],"value":null}, <-
first level
>>> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1, 1],"value":null}, <-
first item in first level
>>> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1, 2],"value":null} <-
second item in first level
>>> ]
>> you would still need to track ancestry in most cases,… the second solution makes
that possible… also your example only works for a single 'giant' tree, unless I'm missing
something… and not a forest. I'm also not seeing  how you would get all the nodes without
having to execute a query for every node on the tree - which is pretty inefficient IMHO
>> 
>> also as others have noted - keeping track of an independent serial, for the sake
of just ordering the tree, with concurrency would be a real challenge; which is why I use
serial ID's.
>> 
>> 
>>> To pull the entire thread based on the parent query is simply startkey=[1,0,0]&endkey=[1,{}]
>> then is your parent_id, really a root_id?  Then I'm really confused how you would
use this with trees at all…  I'm not sure how you model as I'd get duplicates from which
I could never use to reconstruct the tree:
>> 
>> - A                    A root                [A, 0, 0]    
>>    - B                1st child of A            [A, 1, 1]    
>>        - C            1st child of B            [A, 2, 1] ???
>>        - D            2nd child of B        [A, 2, 2] ???
>>    - E                2nd child of A        [A, 1, 2]
>>        - F            1st child of E            [A, 2, 1] ???
>>            -G        1st child of F            [A, 3, 1]
>>        - H            2nd child of E        [A, 2, 2] ???
>>    
>> 
>>> The advantage of this approach is simply that say I want to display a list of
all posts by user for a specific thread, I can create a view where I emit([parent_id, user_id,
comment_id], null)
>> you could do this with either approach, it's not really an advantage.
>> 
>>> This gives the ability to pull a specific comment for a user based on user_id
and thread_id, or an entire list of comments based on user_id. These sorts of indexes are
very cheap and flexible. You never have to mess with creating your own custom id system. Of
course, the tradeoff is that you have to do your own conflict resolution for async operations
with thread ids if you want them to increment. Better solution here is to use both timestamp
and user_id for the actual comment to ensure it is unique and still sorts well.
>> again serial id's solve that (not the default UUID's couchdb issues, AFAIK they are
not incremental, however I could be wrong), is there a reason you want to avoid having a smart
id?
>> 
>> FWIW: this all seems like deja vu http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+
> 
> -- 
> 
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
> 
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  michael@binarykitchen.com
> Website ..............  http://www.binarykitchen.com
> 

Mime
View raw message