accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: recursive rollup
Date Thu, 12 Apr 2012 02:32:22 GMT
You probably would not want to maintain the full hierarchy in the row
if it was deep.  I think we did this in the filesystem example because
we wanted to do one lookup to retrieve a path and not have to follow
it down.  However I suspect it would still be more optimal if you
could avoid a random access for each node in the tree. Especially if
the tree is wide and deep.

I am thinking you can do the following, but I have still not proved it
to myself completely.  The reason I always keep two levels of the path
is that you want all of the children to sort together.  For example at
level 002, all of the children of B will sort together.  So you can
scan all of the children at of B at level 2 and when you see something
thats not a child of B you can emit a count for node B to level 1.

001:A/B childOf /A
001:A/C childOf /A
002:B/D childOf A/B
002:B/E childOf A/B
002:C/F childOf A/C
002:C/G childOf A/C

Still do the algorithm I described earlier where you scan starting at
the deepest depth.  So when scanning level 002, do the following

001:A/B children 2
001:A/C children 2



2012/4/11 Perko, Ralph J <>:
> Keith, thanks for the quick reply. I see the hierarchy is maintained in
> the row id - would you still
> recommend this approach for deep hierarchies, on the order of hundreds and
> perhaps thousands?
> Maybe this is an accumulo newbie question - Is this approach better than
> walking the tree?  My current schema is something like
> A parentOf:B 1
> A parentOf:C 1
> B childOf:A 1
> C childOf:A 1
> And so on…
> Thanks,
> Ralph
> On 4/10/12 3:23 PM, "Keith Turner" <> wrote:
>>oooppsss I was not paying close attention,  when it scans level 001 it
>>will insert the following
>>  000/A count = 6 (not 5)
>>On Tue, Apr 10, 2012 at 6:21 PM, Keith Turner <> wrote:
>>> Take a look at the
>>> org.apache.accumulo.examples.simple.dirlist.FileCount example,  I
>>> think it does what you are asking.
>>> The way it works it that assume your tree would be stored in accumulo
>>> as follows.
>>> 000/A
>>> 001/A/B
>>> 001/A/C
>>> 002/A/B/D
>>> 002/A/B/E
>>> 002/A/C/F
>>> 002/A/C/G
>>> The number before the path is the depth, therefore all data for a
>>> particular depth of the tree is stored contiguously.  The example
>>> program scans each depth, starting with the highest depth, and push
>>> counts up.  This very efficient because there is not a lot of random
>>> access, it sequentially reads data for each depth and stream updates
>>> to the lower depth.
>>> For example when reading level 002, it would do the following two
>>>inserts :
>>>  001/A/B count=2
>>>  001/A/C count=2
>>> Then it would read level 001 and push the following insert :
>>>  000/A count = 5
>>> Keith
>>> On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <>
>>>> Hi, I wish to do a recursive rollup-count and am wondering the best
>>>>way to
>>>> do this.
>>>> What I mean is this ­ in accumulo is a table with data that represents
>>>> nodes and leafs in a tree.  Each node/leaf in accumulo knows it's
>>>> and children and wether it is a node or leaf.  I wish to have a
>>>> rollup-count for any given node to know the combined total of all
>>>> descendants.
>>>> For example, given the tree:
>>>>                          A
>>>>                          |
>>>>                       --------
>>>>                       |      |
>>>>                       B      C
>>>>                       |      |
>>>>                     -----  -----
>>>>                     |   |  |    |
>>>>                     D   E  F    G
>>>> "A" would have 6 descendants.
>>>> I can use the SummingCombiner iterator to get a child count, e.g "A"
>>>>has 2
>>>> children, but I am not sure the best way to recurse down.  The data is
>>>> static so I do not necessarily need a dynamic, on-the-fly solution.
>>>> Thanks for your help,
>>>> Ralph

View raw message