accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <adam.p.fu...@ugov.gov>
Subject Re: recursive rollup
Date Thu, 12 Apr 2012 16:50:33 GMT
This probably won't work, unless all node names are unique at a given
level. For example, given paths /a/c/d/e and /b/c/d/e you would have two
conflicting entries for "4:d/e childOf c/d". You might be able to use a
unique name instead with a similar scheme, but that could possibly
introduce a bottleneck.

Another thought on this is if you actually have path depths in the
thousands then there's probably a compression scheme that could save you a
lot of space and compute time, as the branching factor would necessarily be
1 for the majority of those nodes.

Cheers,
Adam


On Wed, Apr 11, 2012 at 10:43 PM, Keith Turner <keith@deenlo.com> wrote:

> There is one other refinement to this table structure, the depth could
> be sorted in inverse order.  To do this store (MAX_DEPTH - depth).
> This avoids the binary search the in filesystem example to find the
> max depth.  So the tree would look like the following, assuming max
> dept is 999.
>
> 997:B/D childOf A/B
> 997:B/E childOf A/B
> 997:C/F childOf A/C
> 997:C/G childOf A/C
> 998:A/B childOf /A
> 998:A/C childOf /A
>
> Keith
>
> 2012/4/11 Perko, Ralph J <Ralph.Perko@pnnl.gov>:
> > Keith, thanks for the quick reply. I see the hierarchy is maintained in
> > the row id - would you still
> > recommend this approach for deep hierarchies, on the order of hundreds
> and
> > perhaps thousands?
> >
> > Maybe this is an accumulo newbie question - Is this approach better than
> > walking the tree?  My current schema is something like
> >
> > A parentOf:B 1
> > A parentOf:C 1
> > B childOf:A 1
> > C childOf:A 1
> > And so on…
> >
> > Thanks,
> > Ralph
> >
> >
> > On 4/10/12 3:23 PM, "Keith Turner" <keith@deenlo.com> wrote:
> >
> >>oooppsss I was not paying close attention,  when it scans level 001 it
> >>will insert the following
> >>
> >>  000/A count = 6 (not 5)
> >>
> >>On Tue, Apr 10, 2012 at 6:21 PM, Keith Turner <keith@deenlo.com> wrote:
> >>> Take a look at the
> >>> org.apache.accumulo.examples.simple.dirlist.FileCount example,  I
> >>> think it does what you are asking.
> >>>
> >>> The way it works it that assume your tree would be stored in accumulo
> >>> as follows.
> >>>
> >>> 000/A
> >>> 001/A/B
> >>> 001/A/C
> >>> 002/A/B/D
> >>> 002/A/B/E
> >>> 002/A/C/F
> >>> 002/A/C/G
> >>>
> >>> The number before the path is the depth, therefore all data for a
> >>> particular depth of the tree is stored contiguously.  The example
> >>> program scans each depth, starting with the highest depth, and push
> >>> counts up.  This very efficient because there is not a lot of random
> >>> access, it sequentially reads data for each depth and stream updates
> >>> to the lower depth.
> >>>
> >>> For example when reading level 002, it would do the following two
> >>>inserts :
> >>>
> >>>  001/A/B count=2
> >>>  001/A/C count=2
> >>>
> >>> Then it would read level 001 and push the following insert :
> >>>
> >>>  000/A count = 5
> >>>
> >>> Keith
> >>>
> >>> On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
> >>>wrote:
> >>>> Hi, I wish to do a recursive rollup-count and am wondering the best
> >>>>way to
> >>>> do this.
> >>>>
> >>>> What I mean is this ­ in accumulo is a table with data that represents
> >>>>the
> >>>> nodes and leafs in a tree.  Each node/leaf in accumulo knows it's
> >>>>parent
> >>>> and children and wether it is a node or leaf.  I wish to have a
> >>>> rollup-count for any given node to know the combined total of all
> >>>> descendants.
> >>>>
> >>>> For example, given the tree:
> >>>>
> >>>>                          A
> >>>>                          |
> >>>>                       --------
> >>>>                       |      |
> >>>>                       B      C
> >>>>                       |      |
> >>>>                     -----  -----
> >>>>                     |   |  |    |
> >>>>                     D   E  F    G
> >>>>
> >>>> "A" would have 6 descendants.
> >>>>
> >>>> I can use the SummingCombiner iterator to get a child count, e.g "A"
> >>>>has 2
> >>>> children, but I am not sure the best way to recurse down.  The data
is
> >>>> static so I do not necessarily need a dynamic, on-the-fly solution.
> >>>>
> >>>> Thanks for your help,
> >>>> Ralph
> >
>

Mime
View raw message