interesting discussion!

jim, yes, in my case nesting won't be very deep so i've been thinking about your suggestion:

1 : orig message
1-2 : reply to 1
1-2-5 : reply to 2
1-3 : reply to 1
1-3-4 : reply to

you say "1-3-4" is an ID, right?

Yes

so tell me, if the message with the ID "1-3-4" wants to find out who his parent is, then we'd have to parse the ID, remove one digit and query the database for "1-3" to get the parent. do you think this is optimal for the couchdb?

You could do it this way; I've only outlined a strategy for ID's that sorts in a hierarchal fashion, not a way to determine ancestry. I'd probably suggest just storing the parent id in the child - depends upon your actual use case.

Another alternative, but would require the use of a view, is to store the parent ancestry as an array in each document instead of using a composite id ( I believe others have suggested this too):

{ id:1, ancestry: []} 
{ id:2, ancestry: [1]} 
{ id:5, ancestry: [1, 2]} 
{ id:3, ancestry: [1]} 
{ id:4, ancestry: [1, 3]} 
{ id:6, ancestry: [1, 2]} 
{ id:7, ancestry: [1, 3]} 

then a map function:

function(doc) {
emit(doc.ancestry.concat([doc.id]), null);
}

and just use the built-in _count reduce if you need to count the number of nodes in a tree (or sub tree)

This will create a view that sorts in the same way as the id, but using an array instead of a string. This view will look something like, noting that the key is what's sorted:

{
rows: [
{ id: 1, key: [ 1 ], value= null},
{ id: 2, key: [ 1, 2 ], value= null},
{ id: 5, key: [ 1, 2, 5 ], value= null},
{ id: 6, key: [ 1, 2, 6 ], value= null},
{ id: 3, key: [ 1, 3 ], value= null},
{ id: 4, key: [ 1, 3, 4 ], value= null},
{ id: 7, key: [ 1, 3, 7 ], value= null},
]
}

you can then query an entire tree with startkey=[1]&endkey=[1,{}] which will give you all the descendants for the one tree with one query.  fetching a sub tree works similarly: startkey=[1,2]&endkey=[1,2,{}] 

and you will have equivalent functionality to using a composite id with possibly a simpler to comprehend mechanism for building queries and  creating ids at the expense of creating a view...

you'll probably also want a view that just lists root messages too… that would be:

function(doc) {
if (doc.ancestry.length == 0)
emit(doc.id, null);
}

- Jim


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t. @nsomnac


On 22/02/13 20:57, Jim Klo wrote:
On Feb 21, 2013, at 11:31 PM, "Elisiano Petrini" <elisiano@gmail.com> wrote:

I understand the concept, but on the same level there could be more than
one child. Using this as an ID would work only if you put something
unique to the node in composing the ID.

I don't recall multiple nodes being a constraint in the problem, even then it's solvable using sharded ID's - the reality is there's always finite number of nodes. The only thing that changes is that tree won't sort naturally in the order they arrive. Remember, the parent node has to exist everywhere for a collision to occur.

Generating a sequential ID (per tree level) depending on the order of
arrival might lead to collisions (if messages arrive at the same time).

Sure, but only if you have a badly designed sequence generators that are not designed to operate in a sharded manner. A simple way is to pre-allocate a pool of ID's to a process (say 500 ID's per process). No other process should be able to use another process' allocation. Once exhausted, the process fetches another pool of IDs. This may not be ideal in a mobile scenario, but would probably still work, you just might have to devise a smarter allocation algorithm based upon usage and connectivity frequencies. (And now I've really digressed) This is a very common solution done in the RDBMS world to combat ID conflicts.


Anyhow, I feel we're digressing here :)

It is worth discussion as not everyone is experienced in massively parallel systems. :)

If the originator has only a single node - they shouldn't have any conflict issues with the simplest solution.



On Fri, 2013-02-22 at 16:23 +0900, svilen wrote:
that's the path to root stored as id
so answers to 1-2 will be say 1-2-5 1-2-6 1-2-33

Hi Kim,
   I don't think using this kind of documents' ids is a viable
solution: what if there is more that one message in response to
another (ie:several people answering to the same message)?

On Fri, 2013-02-22 at 07:00 +0000, Jim Klo wrote:
In theory you mention infinite depth, but is that realistic?

A simple way is to make ID's a composite serial and chain them
together.

1 : orig message
1-2 : reply to 1
1-2-5 : reply to 2
1-3 : reply to 1
1-3-4 : reply to

The keys will then sort in order, such that can build the tree DFS
and use range queries to get just segments.

You can obviously do some creative things in compression - using
different number bases to handle deep trees.



On Feb 21, 2013, at 10:10 PM, "Michael Heuberger"
<michael.heuberger@binarykitchen.com> wrote:

andrey, svilen, many thanks again.

agreed, storing immediate children in a document is a bad idea. i
think i go for storing only an immediate parent id to get an
infinite tree depth.

thanks for your inspirations guys

cheers
michael

On 22/02/13 18:49, Andrey Kuprianov wrote:
PS - storing only an immediate parent id would be the solution
for an infinite tree depth.


On Fri, Feb 22, 2013 at 1:48 PM, Andrey Kuprianov <
andrey.kouprianov@gmail.com> wrote:

Storing immediate children in a parent is a bad idea in a sense
that you will never be able to traverse your tree back to the
root, given a child node.

Storing all in one doc - you will run out of space before you
know it. Documents have a finite length and the larger the
document, the slower it will get to serialize and unserialize
it.

Storing just an immediate parent in children might result in a
costly multiple read operations, if you want to get root of the
tree given an arbitrary child.

Therefore the only feasible solution is to store references to
the parent chain - i.e. ancestors - starting from an immediate
parent and ending with a tree root.


Here's a simple calculation for ancestors approach:

Suppose the maximum document size is 1MB. A uuid is 32
characters in length + taking into account commas and brackets,
then the depth of your tree (I am not taking content into
account) could be roughly:

((32 + 1)*n  + 2 - 1) = 1024 * 1024  ==> n = 31775 levels

However, max document size is configurable from the settings
(as of CouchDB 1.2) and could easily be up to 4GB, giving you
over 130mil of levels. Should be enough... :)



On Fri, Feb 22, 2013 at 1:14 PM, Michael Heuberger <
michael.heuberger@binarykitchen.com> wrote:

thanks andrey and svil

depth is infinite. in theory a message can have million
replies forth and back. similar like a very long email thread.

i think storing ancestors ¹ la andrey would be too much hassle
but works well for few levels.

svil, what did you mean with dicts and with keeping all
children in one doc? can you tell me more?

thanks,
michael


On 22/02/13 17:27, svilen wrote:

my rough guess:

if it's finite depth, then all in one doc:
- list of (item or (list of ...))
- or same with dicts

else, one doc per message keeping just link-to-parent,
or multiple links-to-grand...grand-parents and/or root.
similar to the strategies in SQL - nested etc.
keeping all chidlren of same node in one doc is also
possible..

in any case either traversal or storing or both will be
difficult.

ciao
svil

On Fri, 22 Feb 2013 17:13:01 +1300
Michael Heuberger
<michael.heuberger@**binarykitchen.com<michael.heuberger@binarykitchen.com>>
wrote:

Hello guys
I'd like to store messages in a tree like structure.
Whenever you reply to a message, then the original message
becomes the parent message.

How would you implement something like this in CouchDB? Just
curious and need a little guidance + advice here.

Thanks,
Michael

--

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  michael@binarykitchen.com
Website ..............  http://www.binarykitchen.com
--

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  michael@binarykitchen.com
Website ..............  http://www.binarykitchen.com
--

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  michael@binarykitchen.com
Website ..............  http://www.binarykitchen.com


--

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  michael@binarykitchen.com
Website ..............  http://www.binarykitchen.com