incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Алекс Zatvornitskiy <a.zatvornits...@gmail.com>
Subject Re: Tree like structures in CouchDB
Date Fri, 22 Feb 2013 19:29:55 GMT
@Jim Klo suggested a nice solution. A document centric database implies
embedding what you need inside a document. However it still depends from
your actual environment/scope/specific


On Fri, Feb 22, 2013 at 8:04 PM, Jim Klo <jim.klo@sri.com> wrote:

>
> interesting discussion!
>
> jim, yes, in my case nesting won't be very deep so i've been thinking
> about your suggestion:
>
> 1 : orig message
> 1-2 : reply to 1
> 1-2-5 : reply to 2
> 1-3 : reply to 1
> 1-3-4 : reply to
>
> you say "1-3-4" is an ID, right?
>
>
> Yes
>
> so tell me, if the message with the ID "1-3-4" wants to find out who his
> parent is, then we'd have to parse the ID, remove one digit and query the
> database for "1-3" to get the parent. do you think this is optimal for the
> couchdb?
>
>
> You could do it this way; I've only outlined a strategy for ID's that
> sorts in a hierarchal fashion, not a way to determine ancestry. I'd
> probably suggest just storing the parent id in the child - depends upon
> your actual use case.
>
> Another alternative, but would require the use of a view, is to store the
> parent ancestry as an array in each document instead of using a composite
> id ( I believe others have suggested this too):
>
> { id:1, ancestry: []}
> { id:2, ancestry: [1]}
> { id:5, ancestry: [1, 2]}
> { id:3, ancestry: [1]}
> { id:4, ancestry: [1, 3]}
> { id:6, ancestry: [1, 2]}
> { id:7, ancestry: [1, 3]}
>
> then a map function:
>
> function(doc) {
> emit(doc.ancestry.concat([doc.id]), null);
> }
>
> and just use the built-in _count reduce if you need to count the number of
> nodes in a tree (or sub tree)
>
> This will create a view that sorts in the same way as the id, but using an
> array instead of a string. This view will look something like, noting that
> the key is what's sorted:
>
> {
> rows: [
> { id: 1, key: [ 1 ], value= null},
> { id: 2, key: [ 1, 2 ], value= null},
> { id: 5, key: [ 1, 2, 5 ], value= null},
> { id: 6, key: [ 1, 2, 6 ], value= null},
> { id: 3, key: [ 1, 3 ], value= null},
> { id: 4, key: [ 1, 3, 4 ], value= null},
> { id: 7, key: [ 1, 3, 7 ], value= null},
> ]
> }
>
> you can then query an entire tree with startkey=[1]&endkey=[1,{}] which
> will give you all the descendants for the one tree with one query.
>  fetching a sub tree works similarly: startkey=[1,2]&endkey=[1,2,{}]
>
> and you will have equivalent functionality to using a composite id with
> possibly a simpler to comprehend mechanism for building queries and
>  creating ids at the expense of creating a view...
>
> you'll probably also want a view that just lists root messages too… that
> would be:
>
> function(doc) {
> if (doc.ancestry.length == 0)
> emit(doc.id, null);
> }
>
> - Jim
>
>
> *
> Jim Klo
> Senior Software Engineer
> Center for Software Engineering
> SRI International
> *
> *
> t. @nsomnac
> *
>
>
> On 22/02/13 20:57, Jim Klo wrote:
>
> On Feb 21, 2013, at 11:31 PM, "Elisiano Petrini" <elisiano@gmail.com>
> wrote:
>
> I understand the concept, but on the same level there could be more than
> one child. Using this as an ID would work only if you put something
> unique to the node in composing the ID.
>
> I don't recall multiple nodes being a constraint in the problem, even then
> it's solvable using sharded ID's - the reality is there's always finite
> number of nodes. The only thing that changes is that tree won't sort
> naturally in the order they arrive. Remember, the parent node has to exist
> everywhere for a collision to occur.
>
> Generating a sequential ID (per tree level) depending on the order of
> arrival might lead to collisions (if messages arrive at the same time).
>
> Sure, but only if you have a badly designed sequence generators that are
> not designed to operate in a sharded manner. A simple way is to
> pre-allocate a pool of ID's to a process (say 500 ID's per process). No
> other process should be able to use another process' allocation. Once
> exhausted, the process fetches another pool of IDs. This may not be ideal
> in a mobile scenario, but would probably still work, you just might have to
> devise a smarter allocation algorithm based upon usage and connectivity
> frequencies. (And now I've really digressed) This is a very common solution
> done in the RDBMS world to combat ID conflicts.
>
>
> Anyhow, I feel we're digressing here :)
>
> It is worth discussion as not everyone is experienced in massively
> parallel systems. :)
>
> If the originator has only a single node - they shouldn't have any
> conflict issues with the simplest solution.
>
>
>
> On Fri, 2013-02-22 at 16:23 +0900, svilen wrote:
>
> that's the path to root stored as id
> so answers to 1-2 will be say 1-2-5 1-2-6 1-2-33
>
> Hi Kim,
>    I don't think using this kind of documents' ids is a viable
> solution: what if there is more that one message in response to
> another (ie:several people answering to the same message)?
>
> On Fri, 2013-02-22 at 07:00 +0000, Jim Klo wrote:
>
> In theory you mention infinite depth, but is that realistic?
>
> A simple way is to make ID's a composite serial and chain them
> together.
>
> 1 : orig message
> 1-2 : reply to 1
> 1-2-5 : reply to 2
> 1-3 : reply to 1
> 1-3-4 : reply to
>
> The keys will then sort in order, such that can build the tree DFS
> and use range queries to get just segments.
>
> You can obviously do some creative things in compression - using
> different number bases to handle deep trees.
>
>
>
> On Feb 21, 2013, at 10:10 PM, "Michael Heuberger"
> <michael.heuberger@binarykitchen.com> wrote:
>
> andrey, svilen, many thanks again.
>
> agreed, storing immediate children in a document is a bad idea. i
> think i go for storing only an immediate parent id to get an
> infinite tree depth.
>
> thanks for your inspirations guys
>
> cheers
> michael
>
> On 22/02/13 18:49, Andrey Kuprianov wrote:
>
> PS - storing only an immediate parent id would be the solution
> for an infinite tree depth.
>
>
> On Fri, Feb 22, 2013 at 1:48 PM, Andrey Kuprianov <
> andrey.kouprianov@gmail.com> wrote:
>
> Storing immediate children in a parent is a bad idea in a sense
> that you will never be able to traverse your tree back to the
> root, given a child node.
>
> Storing all in one doc - you will run out of space before you
> know it. Documents have a finite length and the larger the
> document, the slower it will get to serialize and unserialize
> it.
>
> Storing just an immediate parent in children might result in a
> costly multiple read operations, if you want to get root of the
> tree given an arbitrary child.
>
> Therefore the only feasible solution is to store references to
> the parent chain - i.e. ancestors - starting from an immediate
> parent and ending with a tree root.
>
>
> Here's a simple calculation for ancestors approach:
>
> Suppose the maximum document size is 1MB. A uuid is 32
> characters in length + taking into account commas and brackets,
> then the depth of your tree (I am not taking content into
> account) could be roughly:
>
> ((32 + 1)*n  + 2 - 1) = 1024 * 1024  ==> n = 31775 levels
>
> However, max document size is configurable from the settings
> (as of CouchDB 1.2) and could easily be up to 4GB, giving you
> over 130mil of levels. Should be enough... :)
>
>
>
> On Fri, Feb 22, 2013 at 1:14 PM, Michael Heuberger <
> michael.heuberger@binarykitchen.com> wrote:
>
> thanks andrey and svil
>
> depth is infinite. in theory a message can have million
> replies forth and back. similar like a very long email thread.
>
> i think storing ancestors ą la andrey would be too much hassle
> but works well for few levels.
>
> svil, what did you mean with dicts and with keeping all
> children in one doc? can you tell me more?
>
> thanks,
> michael
>
>
> On 22/02/13 17:27, svilen wrote:
>
> my rough guess:
>
> if it's finite depth, then all in one doc:
> - list of (item or (list of ...))
> - or same with dicts
>
> else, one doc per message keeping just link-to-parent,
> or multiple links-to-grand...grand-parents and/or root.
> similar to the strategies in SQL - nested etc.
> keeping all chidlren of same node in one doc is also
> possible..
>
> in any case either traversal or storing or both will be
> difficult.
>
> ciao
> svil
>
> On Fri, 22 Feb 2013 17:13:01 +1300
> Michael Heuberger
> <michael.heuberger@**binarykitchen.com<michael.heuberger@binarykitchen.com
> >>
> wrote:
>
> Hello guys
>
> I'd like to store messages in a tree like structure.
> Whenever you reply to a message, then the original message
> becomes the parent message.
>
> How would you implement something like this in CouchDB? Just
> curious and need a little guidance + advice here.
>
> Thanks,
> Michael
>
> --
>
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
>
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  michael@binarykitchen.com
> Website ..............  http://www.binarykitchen.com
>
> --
>
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
>
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  michael@binarykitchen.com
> Website ..............  http://www.binarykitchen.com
>
> --
>
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
>
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  michael@binarykitchen.com
> Website ..............  http://www.binarykitchen.com
>
>
>
> --
>
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
>
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  michael@binarykitchen.com
> Website ..............  http://www.binarykitchen.com
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message