incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Klo <jim....@sri.com>
Subject Re: Tree like structures in CouchDB
Date Fri, 22 Feb 2013 18:04:23 GMT

> interesting discussion!
> 
> jim, yes, in my case nesting won't be very deep so i've been thinking about your suggestion:
> 
> 1 : orig message
> 1-2 : reply to 1
> 1-2-5 : reply to 2
> 1-3 : reply to 1
> 1-3-4 : reply to
> 
> you say "1-3-4" is an ID, right?

Yes

> so tell me, if the message with the ID "1-3-4" wants to find out who his parent is, then
we'd have to parse the ID, remove one digit and query the database for "1-3" to get the parent.
do you think this is optimal for the couchdb?

You could do it this way; I've only outlined a strategy for ID's that sorts in a hierarchal
fashion, not a way to determine ancestry. I'd probably suggest just storing the parent id
in the child - depends upon your actual use case.

Another alternative, but would require the use of a view, is to store the parent ancestry
as an array in each document instead of using a composite id ( I believe others have suggested
this too):

{ id:1, ancestry: []} 
{ id:2, ancestry: [1]} 
{ id:5, ancestry: [1, 2]} 
{ id:3, ancestry: [1]} 
{ id:4, ancestry: [1, 3]} 
{ id:6, ancestry: [1, 2]} 
{ id:7, ancestry: [1, 3]} 

then a map function:

function(doc) {
	emit(doc.ancestry.concat([doc.id]), null);
}

and just use the built-in _count reduce if you need to count the number of nodes in a tree
(or sub tree)

This will create a view that sorts in the same way as the id, but using an array instead of
a string. This view will look something like, noting that the key is what's sorted:

{
	rows: [
		{ id: 1, key: [ 1 ], value= null},
		{ id: 2, key: [ 1, 2 ], value= null},
		{ id: 5, key: [ 1, 2, 5 ], value= null},
		{ id: 6, key: [ 1, 2, 6 ], value= null},
		{ id: 3, key: [ 1, 3 ], value= null},
		{ id: 4, key: [ 1, 3, 4 ], value= null},
		{ id: 7, key: [ 1, 3, 7 ], value= null},
	]
}

you can then query an entire tree with startkey=[1]&endkey=[1,{}] which will give you
all the descendants for the one tree with one query.  fetching a sub tree works similarly:
startkey=[1,2]&endkey=[1,2,{}] 

and you will have equivalent functionality to using a composite id with possibly a simpler
to comprehend mechanism for building queries and  creating ids at the expense of creating
a view...

you'll probably also want a view that just lists root messages too… that would be:

function(doc) {
	if (doc.ancestry.length == 0)
		emit(doc.id, null);
}

- Jim


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t.	@nsomnac

> 
> On 22/02/13 20:57, Jim Klo wrote:
>> On Feb 21, 2013, at 11:31 PM, "Elisiano Petrini" <elisiano@gmail.com> wrote:
>> 
>>> I understand the concept, but on the same level there could be more than
>>> one child. Using this as an ID would work only if you put something
>>> unique to the node in composing the ID.
>>> 
>> I don't recall multiple nodes being a constraint in the problem, even then it's solvable
using sharded ID's - the reality is there's always finite number of nodes. The only thing
that changes is that tree won't sort naturally in the order they arrive. Remember, the parent
node has to exist everywhere for a collision to occur.
>> 
>>> Generating a sequential ID (per tree level) depending on the order of
>>> arrival might lead to collisions (if messages arrive at the same time).
>>> 
>> Sure, but only if you have a badly designed sequence generators that are not designed
to operate in a sharded manner. A simple way is to pre-allocate a pool of ID's to a process
(say 500 ID's per process). No other process should be able to use another process' allocation.
Once exhausted, the process fetches another pool of IDs. This may not be ideal in a mobile
scenario, but would probably still work, you just might have to devise a smarter allocation
algorithm based upon usage and connectivity frequencies. (And now I've really digressed) This
is a very common solution done in the RDBMS world to combat ID conflicts.
>> 
>> 
>>> Anyhow, I feel we're digressing here :)
>>> 
>> It is worth discussion as not everyone is experienced in massively parallel systems.
:)
>> 
>> If the originator has only a single node - they shouldn't have any conflict issues
with the simplest solution.
>> 
>> 
>> 
>>> On Fri, 2013-02-22 at 16:23 +0900, svilen wrote:
>>>> that's the path to root stored as id
>>>> so answers to 1-2 will be say 1-2-5 1-2-6 1-2-33
>>>> 
>>>>> Hi Kim,
>>>>>    I don't think using this kind of documents' ids is a viable
>>>>> solution: what if there is more that one message in response to
>>>>> another (ie:several people answering to the same message)?
>>>>> 
>>>>> On Fri, 2013-02-22 at 07:00 +0000, Jim Klo wrote:
>>>>>> In theory you mention infinite depth, but is that realistic?
>>>>>> 
>>>>>> A simple way is to make ID's a composite serial and chain them
>>>>>> together.
>>>>>> 
>>>>>> 1 : orig message
>>>>>> 1-2 : reply to 1
>>>>>> 1-2-5 : reply to 2
>>>>>> 1-3 : reply to 1
>>>>>> 1-3-4 : reply to
>>>>>> 
>>>>>> The keys will then sort in order, such that can build the tree DFS
>>>>>> and use range queries to get just segments.
>>>>>> 
>>>>>> You can obviously do some creative things in compression - using
>>>>>> different number bases to handle deep trees.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Feb 21, 2013, at 10:10 PM, "Michael Heuberger"
>>>>>> <michael.heuberger@binarykitchen.com> wrote:
>>>>>> 
>>>>>>> andrey, svilen, many thanks again.
>>>>>>> 
>>>>>>> agreed, storing immediate children in a document is a bad idea.
i
>>>>>>> think i go for storing only an immediate parent id to get an
>>>>>>> infinite tree depth.
>>>>>>> 
>>>>>>> thanks for your inspirations guys
>>>>>>> 
>>>>>>> cheers
>>>>>>> michael
>>>>>>> 
>>>>>>> On 22/02/13 18:49, Andrey Kuprianov wrote:
>>>>>>>> PS - storing only an immediate parent id would be the solution
>>>>>>>> for an infinite tree depth.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Feb 22, 2013 at 1:48 PM, Andrey Kuprianov <
>>>>>>>> andrey.kouprianov@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Storing immediate children in a parent is a bad idea
in a sense
>>>>>>>>> that you will never be able to traverse your tree back
to the
>>>>>>>>> root, given a child node.
>>>>>>>>> 
>>>>>>>>> Storing all in one doc - you will run out of space before
you
>>>>>>>>> know it. Documents have a finite length and the larger
the
>>>>>>>>> document, the slower it will get to serialize and unserialize
>>>>>>>>> it.
>>>>>>>>> 
>>>>>>>>> Storing just an immediate parent in children might result
in a
>>>>>>>>> costly multiple read operations, if you want to get root
of the
>>>>>>>>> tree given an arbitrary child.
>>>>>>>>> 
>>>>>>>>> Therefore the only feasible solution is to store references
to
>>>>>>>>> the parent chain - i.e. ancestors - starting from an
immediate
>>>>>>>>> parent and ending with a tree root.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Here's a simple calculation for ancestors approach:
>>>>>>>>> 
>>>>>>>>> Suppose the maximum document size is 1MB. A uuid is 32
>>>>>>>>> characters in length + taking into account commas and
brackets,
>>>>>>>>> then the depth of your tree (I am not taking content
into
>>>>>>>>> account) could be roughly:
>>>>>>>>> 
>>>>>>>>> ((32 + 1)*n  + 2 - 1) = 1024 * 1024  ==> n = 31775
levels
>>>>>>>>> 
>>>>>>>>> However, max document size is configurable from the settings
>>>>>>>>> (as of CouchDB 1.2) and could easily be up to 4GB, giving
you
>>>>>>>>> over 130mil of levels. Should be enough... :)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 22, 2013 at 1:14 PM, Michael Heuberger <
>>>>>>>>> michael.heuberger@binarykitchen.com> wrote:
>>>>>>>>> 
>>>>>>>>>> thanks andrey and svil
>>>>>>>>>> 
>>>>>>>>>> depth is infinite. in theory a message can have million
>>>>>>>>>> replies forth and back. similar like a very long
email thread.
>>>>>>>>>> 
>>>>>>>>>> i think storing ancestors ą la andrey would be too
much hassle
>>>>>>>>>> but works well for few levels.
>>>>>>>>>> 
>>>>>>>>>> svil, what did you mean with dicts and with keeping
all
>>>>>>>>>> children in one doc? can you tell me more?
>>>>>>>>>> 
>>>>>>>>>> thanks,
>>>>>>>>>> michael
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 22/02/13 17:27, svilen wrote:
>>>>>>>>>> 
>>>>>>>>>>> my rough guess:
>>>>>>>>>>> 
>>>>>>>>>>> if it's finite depth, then all in one doc:
>>>>>>>>>>> - list of (item or (list of ...))
>>>>>>>>>>> - or same with dicts
>>>>>>>>>>> 
>>>>>>>>>>> else, one doc per message keeping just link-to-parent,
>>>>>>>>>>> or multiple links-to-grand...grand-parents and/or
root.
>>>>>>>>>>> similar to the strategies in SQL - nested etc.
>>>>>>>>>>> keeping all chidlren of same node in one doc
is also
>>>>>>>>>>> possible..
>>>>>>>>>>> 
>>>>>>>>>>> in any case either traversal or storing or both
will be
>>>>>>>>>>> difficult.
>>>>>>>>>>> 
>>>>>>>>>>> ciao
>>>>>>>>>>> svil
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, 22 Feb 2013 17:13:01 +1300
>>>>>>>>>>> Michael Heuberger
>>>>>>>>>>> <michael.heuberger@**binarykitchen.com<michael.heuberger@binarykitchen.com>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hello guys
>>>>>>>>>>>> I'd like to store messages in a tree like
structure.
>>>>>>>>>>>> Whenever you reply to a message, then the
original message
>>>>>>>>>>>> becomes the parent message.
>>>>>>>>>>>> 
>>>>>>>>>>>> How would you implement something like this
in CouchDB? Just
>>>>>>>>>>>> curious and need a little guidance + advice
here.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Michael
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> Binary Kitchen
>>>>>>>>>>>> Michael Heuberger
>>>>>>>>>>>> 4c Dunbar Road
>>>>>>>>>>>> Mt Eden
>>>>>>>>>>>> Auckland 1024
>>>>>>>>>>>> (New Zealand)
>>>>>>>>>>>> 
>>>>>>>>>>>> Mobile (text only) ...  +64 21 261 89 81
>>>>>>>>>>>> Email ................  michael@binarykitchen.com
>>>>>>>>>>>> Website ..............  http://www.binarykitchen.com
>>>>>>>>>> --
>>>>>>>>>> 
>>>>>>>>>> Binary Kitchen
>>>>>>>>>> Michael Heuberger
>>>>>>>>>> 4c Dunbar Road
>>>>>>>>>> Mt Eden
>>>>>>>>>> Auckland 1024
>>>>>>>>>> (New Zealand)
>>>>>>>>>> 
>>>>>>>>>> Mobile (text only) ...  +64 21 261 89 81
>>>>>>>>>> Email ................  michael@binarykitchen.com
>>>>>>>>>> Website ..............  http://www.binarykitchen.com
>>>>>>> -- 
>>>>>>> 
>>>>>>> Binary Kitchen
>>>>>>> Michael Heuberger
>>>>>>> 4c Dunbar Road
>>>>>>> Mt Eden
>>>>>>> Auckland 1024
>>>>>>> (New Zealand)
>>>>>>> 
>>>>>>> Mobile (text only) ...  +64 21 261 89 81
>>>>>>> Email ................  michael@binarykitchen.com
>>>>>>> Website ..............  http://www.binarykitchen.com
>>> 
> 
> -- 
> 
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
> 
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  michael@binarykitchen.com
> Website ..............  http://www.binarykitchen.com
> 


Mime
View raw message