incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Heuberger <michael.heuber...@binarykitchen.com>
Subject Re: Tree like structures in CouchDB
Date Fri, 22 Feb 2013 12:06:38 GMT
interesting discussion!

jim, yes, in my case nesting won't be very deep so i've been thinking 
about your suggestion:

1 : orig message
1-2 : reply to 1
1-2-5 : reply to 2
1-3 : reply to 1
1-3-4 : reply to

you say "1-3-4" is an ID, right? so tell me, if the message with the ID 
"1-3-4" wants to find out who his parent is, then we'd have to parse the 
ID, remove one digit and query the database for "1-3" to get the parent. 
do you think this is optimal for the couchdb?

On 22/02/13 20:57, Jim Klo wrote:
> On Feb 21, 2013, at 11:31 PM, "Elisiano Petrini" <elisiano@gmail.com> wrote:
>
>> I understand the concept, but on the same level there could be more than
>> one child. Using this as an ID would work only if you put something
>> unique to the node in composing the ID.
>>
> I don't recall multiple nodes being a constraint in the problem, even then it's solvable
using sharded ID's - the reality is there's always finite number of nodes. The only thing
that changes is that tree won't sort naturally in the order they arrive. Remember, the parent
node has to exist everywhere for a collision to occur.
>
>> Generating a sequential ID (per tree level) depending on the order of
>> arrival might lead to collisions (if messages arrive at the same time).
>>
> Sure, but only if you have a badly designed sequence generators that are not designed
to operate in a sharded manner. A simple way is to pre-allocate a pool of ID's to a process
(say 500 ID's per process). No other process should be able to use another process' allocation.
Once exhausted, the process fetches another pool of IDs. This may not be ideal in a mobile
scenario, but would probably still work, you just might have to devise a smarter allocation
algorithm based upon usage and connectivity frequencies. (And now I've really digressed) This
is a very common solution done in the RDBMS world to combat ID conflicts.
>
>
>> Anyhow, I feel we're digressing here :)
>>
> It is worth discussion as not everyone is experienced in massively parallel systems.
:)
>
> If the originator has only a single node - they shouldn't have any conflict issues with
the simplest solution.
>
>
>
>> On Fri, 2013-02-22 at 16:23 +0900, svilen wrote:
>>> that's the path to root stored as id
>>> so answers to 1-2 will be say 1-2-5 1-2-6 1-2-33
>>>
>>>> Hi Kim,
>>>>     I don't think using this kind of documents' ids is a viable
>>>> solution: what if there is more that one message in response to
>>>> another (ie:several people answering to the same message)?
>>>>
>>>> On Fri, 2013-02-22 at 07:00 +0000, Jim Klo wrote:
>>>>> In theory you mention infinite depth, but is that realistic?
>>>>>
>>>>> A simple way is to make ID's a composite serial and chain them
>>>>> together.
>>>>>
>>>>> 1 : orig message
>>>>> 1-2 : reply to 1
>>>>> 1-2-5 : reply to 2
>>>>> 1-3 : reply to 1
>>>>> 1-3-4 : reply to
>>>>>
>>>>> The keys will then sort in order, such that can build the tree DFS
>>>>> and use range queries to get just segments.
>>>>>
>>>>> You can obviously do some creative things in compression - using
>>>>> different number bases to handle deep trees.
>>>>>
>>>>>
>>>>>
>>>>> On Feb 21, 2013, at 10:10 PM, "Michael Heuberger"
>>>>> <michael.heuberger@binarykitchen.com> wrote:
>>>>>
>>>>>> andrey, svilen, many thanks again.
>>>>>>
>>>>>> agreed, storing immediate children in a document is a bad idea. i
>>>>>> think i go for storing only an immediate parent id to get an
>>>>>> infinite tree depth.
>>>>>>
>>>>>> thanks for your inspirations guys
>>>>>>
>>>>>> cheers
>>>>>> michael
>>>>>>
>>>>>> On 22/02/13 18:49, Andrey Kuprianov wrote:
>>>>>>> PS - storing only an immediate parent id would be the solution
>>>>>>> for an infinite tree depth.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 22, 2013 at 1:48 PM, Andrey Kuprianov <
>>>>>>> andrey.kouprianov@gmail.com> wrote:
>>>>>>>
>>>>>>>> Storing immediate children in a parent is a bad idea in a
sense
>>>>>>>> that you will never be able to traverse your tree back to
the
>>>>>>>> root, given a child node.
>>>>>>>>
>>>>>>>> Storing all in one doc - you will run out of space before
you
>>>>>>>> know it. Documents have a finite length and the larger the
>>>>>>>> document, the slower it will get to serialize and unserialize
>>>>>>>> it.
>>>>>>>>
>>>>>>>> Storing just an immediate parent in children might result
in a
>>>>>>>> costly multiple read operations, if you want to get root
of the
>>>>>>>> tree given an arbitrary child.
>>>>>>>>
>>>>>>>> Therefore the only feasible solution is to store references
to
>>>>>>>> the parent chain - i.e. ancestors - starting from an immediate
>>>>>>>> parent and ending with a tree root.
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's a simple calculation for ancestors approach:
>>>>>>>>
>>>>>>>> Suppose the maximum document size is 1MB. A uuid is 32
>>>>>>>> characters in length + taking into account commas and brackets,
>>>>>>>> then the depth of your tree (I am not taking content into
>>>>>>>> account) could be roughly:
>>>>>>>>
>>>>>>>> ((32 + 1)*n  + 2 - 1) = 1024 * 1024  ==> n = 31775 levels
>>>>>>>>
>>>>>>>> However, max document size is configurable from the settings
>>>>>>>> (as of CouchDB 1.2) and could easily be up to 4GB, giving
you
>>>>>>>> over 130mil of levels. Should be enough... :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 22, 2013 at 1:14 PM, Michael Heuberger <
>>>>>>>> michael.heuberger@binarykitchen.com> wrote:
>>>>>>>>
>>>>>>>>> thanks andrey and svil
>>>>>>>>>
>>>>>>>>> depth is infinite. in theory a message can have million
>>>>>>>>> replies forth and back. similar like a very long email
thread.
>>>>>>>>>
>>>>>>>>> i think storing ancestors ą la andrey would be too much
hassle
>>>>>>>>> but works well for few levels.
>>>>>>>>>
>>>>>>>>> svil, what did you mean with dicts and with keeping all
>>>>>>>>> children in one doc? can you tell me more?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> michael
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22/02/13 17:27, svilen wrote:
>>>>>>>>>
>>>>>>>>>> my rough guess:
>>>>>>>>>>
>>>>>>>>>> if it's finite depth, then all in one doc:
>>>>>>>>>> - list of (item or (list of ...))
>>>>>>>>>> - or same with dicts
>>>>>>>>>>
>>>>>>>>>> else, one doc per message keeping just link-to-parent,
>>>>>>>>>> or multiple links-to-grand...grand-parents and/or
root.
>>>>>>>>>> similar to the strategies in SQL - nested etc.
>>>>>>>>>> keeping all chidlren of same node in one doc is also
>>>>>>>>>> possible..
>>>>>>>>>>
>>>>>>>>>> in any case either traversal or storing or both will
be
>>>>>>>>>> difficult.
>>>>>>>>>>
>>>>>>>>>> ciao
>>>>>>>>>> svil
>>>>>>>>>>
>>>>>>>>>> On Fri, 22 Feb 2013 17:13:01 +1300
>>>>>>>>>> Michael Heuberger
>>>>>>>>>> <michael.heuberger@**binarykitchen.com<michael.heuberger@binarykitchen.com>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello guys
>>>>>>>>>>> I'd like to store messages in a tree like structure.
>>>>>>>>>>> Whenever you reply to a message, then the original
message
>>>>>>>>>>> becomes the parent message.
>>>>>>>>>>>
>>>>>>>>>>> How would you implement something like this in
CouchDB? Just
>>>>>>>>>>> curious and need a little guidance + advice here.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Michael
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Binary Kitchen
>>>>>>>>>>> Michael Heuberger
>>>>>>>>>>> 4c Dunbar Road
>>>>>>>>>>> Mt Eden
>>>>>>>>>>> Auckland 1024
>>>>>>>>>>> (New Zealand)
>>>>>>>>>>>
>>>>>>>>>>> Mobile (text only) ...  +64 21 261 89 81
>>>>>>>>>>> Email ................  michael@binarykitchen.com
>>>>>>>>>>> Website ..............  http://www.binarykitchen.com
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Binary Kitchen
>>>>>>>>> Michael Heuberger
>>>>>>>>> 4c Dunbar Road
>>>>>>>>> Mt Eden
>>>>>>>>> Auckland 1024
>>>>>>>>> (New Zealand)
>>>>>>>>>
>>>>>>>>> Mobile (text only) ...  +64 21 261 89 81
>>>>>>>>> Email ................  michael@binarykitchen.com
>>>>>>>>> Website ..............  http://www.binarykitchen.com
>>>>>> -- 
>>>>>>
>>>>>> Binary Kitchen
>>>>>> Michael Heuberger
>>>>>> 4c Dunbar Road
>>>>>> Mt Eden
>>>>>> Auckland 1024
>>>>>> (New Zealand)
>>>>>>
>>>>>> Mobile (text only) ...  +64 21 261 89 81
>>>>>> Email ................  michael@binarykitchen.com
>>>>>> Website ..............  http://www.binarykitchen.com
>>

-- 

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  michael@binarykitchen.com
Website ..............  http://www.binarykitchen.com


Mime
View raw message