couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Tree like structures in CouchDB
Date Sat, 23 Feb 2013 19:48:11 GMT

On Feb 23, 2013, at 19:44 , Tim Black <tim@alwaysreformed.com> wrote:

> I'd love to see the various key/parent ID structure/query strategy
> proposals that have been mentioned in this thread summarized on a wiki,
> with a chart of which requirements mentioned below each fulfills.  That
> would be awesome.

Or added to the docs, as per

    https://couchdb.readthedocs.org/en/latest/contributing.html

:)

> 
> Tim
> 
> On 02/22/2013 11:02 PM, Michael Heuberger wrote:
>> Thanks Jim.
>> 
>> Ok, here are more details / requirements for my needs:
>> 
>> * Very important is to get the parents and their parents above, up to
>>   about 200 levels I'd say.
>> * On the other hand, for parents it's not that important to know their
>>   children.
>> * Yes, lazy loading please. There is no need for dynamic tree control.
>> * I don't need to find specific subtrees.
>> 
>> That's all. Any suggestions?
>> 
>> Michael
>> 
>> On 23/02/13 17:42, Jim Klo wrote:
>>> Michael,
>>> 
>>> It's hard to gauge what's best - as it depends on your use case...
>>> Possibly provide more details?
>>> 
>>> Q: What are your navigation requirements? Drill up/down, jump to
>>> other trees, etc.
>>> 
>>> Q: Are you building or in need of a dynamic tree control? Do you want
>>> to lazy load the elements?
>>> 
>>> Q: Do you need to find specific subtrees? I've built some tree
>>> visualizations with hundreds of thousands of nodes - the reality is
>>> you can't actually physically display anything human readable with
>>> even a few thousand tree nodes.
>>> 
>>> If you can start defining how you need to use the data - the best
>>> model for storage and retrieval will weed itself out.
>>> 
>>> FWIW, I'd like to understand Wendall's solution, as I don't
>>> understand how to recreate the tree from his description. Sorry
>>> Wendall! :(. I swear I've seen you describe this on list before, but
>>> couldn't find it in the archives.
>>> 
>>> I think both potentially are reasonable approaches, and both have
>>> pros and cons for usage.
>>> 
>>> - Jim
>>> 
>>> On Feb 22, 2013, at 4:29 PM, "Michael Heuberger"
>>> <michael.heuberger@binarykitchen.com> wrote:
>>> 
>>>> Sorry guys, but now you left me a bit confused.
>>>> 
>>>> Again, what's the recommendation to implement tree structures for
>>>> messages in CouchDB?
>>>> 
>>>> Cheers
>>>> Michael
>>>> 
>>>> On 23/02/13 11:42, Jim Klo wrote:
>>>>>> I wanted to give my feedback about what I've learned in this area.
>>>>>> 
>>>>>> First, I don't use the doc _id at all for sorting docs. It solves
>>>>>> one single use-case, but fails if you have others, so instead, I
>>>>>> do this:
>>>>>> 
>>>>>> Every doc, whether the parent or child has identifying
>>>>>> information. So a child might contain the parent id, thread id,
>>>>>> etc. Parent doesn't need to know about it's children so it doesn't
>>>>>> matter, as those can be pulled in a single view query.
>>>>>> 
>>>>>> Say I want to do something as originally stated, I'd create a view
>>>>>> where I emit([parent_id, next_level_id, next_level_id], null) with
>>>>>> default values for the latter nested levels being 0 by default.
>>>>>> When I query the view, I get back a result set that would look
>>>>>> like the following.
>>>>>> [
>>>>>> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0,
>>>>>> 0],"value":null}, <- parent
>>>>>> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1,
>>>>>> 0],"value":null}, <- first level
>>>>>> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1,
>>>>>> 1],"value":null}, <- first item in first level
>>>>>> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1,
>>>>>> 2],"value":null} <- second item in first level
>>>>>> ]
>>>>> you would still need to track ancestry in most cases,… the second
>>>>> solution makes that possible… also your example only works for a
>>>>> single 'giant' tree, unless I'm missing something… and not a
>>>>> forest. I'm also not seeing  how you would get all the nodes
>>>>> without having to execute a query for every node on the tree -
>>>>> which is pretty inefficient IMHO
>>>>> 
>>>>> also as others have noted - keeping track of an independent serial,
>>>>> for the sake of just ordering the tree, with concurrency would be a
>>>>> real challenge; which is why I use serial ID's.
>>>>> 
>>>>> 
>>>>>> To pull the entire thread based on the parent query is simply
>>>>>> startkey=[1,0,0]&endkey=[1,{}]
>>>>> then is your parent_id, really a root_id?  Then I'm really confused
>>>>> how you would use this with trees at all…  I'm not sure how you
>>>>> model as I'd get duplicates from which I could never use to
>>>>> reconstruct the tree:
>>>>> 
>>>>> - A                    A root                [A, 0, 0]
>>>>>    - B                1st child of A            [A, 1, 1]
>>>>>        - C            1st child of B            [A, 2, 1] ???
>>>>>        - D            2nd child of B        [A, 2, 2] ???
>>>>>    - E                2nd child of A        [A, 1, 2]
>>>>>        - F            1st child of E            [A, 2, 1] ???
>>>>>            -G        1st child of F            [A, 3, 1]
>>>>>        - H            2nd child of E        [A, 2, 2] ???
>>>>> 
>>>>>> The advantage of this approach is simply that say I want to
>>>>>> display a list of all posts by user for a specific thread, I can
>>>>>> create a view where I emit([parent_id, user_id, comment_id], null)
>>>>> you could do this with either approach, it's not really an advantage.
>>>>> 
>>>>>> This gives the ability to pull a specific comment for a user based
>>>>>> on user_id and thread_id, or an entire list of comments based on
>>>>>> user_id. These sorts of indexes are very cheap and flexible. You
>>>>>> never have to mess with creating your own custom id system. Of
>>>>>> course, the tradeoff is that you have to do your own conflict
>>>>>> resolution for async operations with thread ids if you want them
>>>>>> to increment. Better solution here is to use both timestamp and
>>>>>> user_id for the actual comment to ensure it is unique and still
>>>>>> sorts well.
>>>>> again serial id's solve that (not the default UUID's couchdb
>>>>> issues, AFAIK they are not incremental, however I could be wrong),
>>>>> is there a reason you want to avoid having a smart id?
>>>>> 
>>>>> FWIW: this all seems like deja vu
>>>>> http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+
>>>> -- 
>>>> 
>>>> Binary Kitchen
>>>> Michael Heuberger
>>>> 4c Dunbar Road
>>>> Mt Eden
>>>> Auckland 1024
>>>> (New Zealand)
>>>> 
>>>> Mobile (text only) ...  +64 21 261 89 81
>>>> Email ................  michael@binarykitchen.com
>>>> Website ..............  http://www.binarykitchen.com
>>>> 
>> 
> 


Mime
View raw message