accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject Re: recursive rollup
Date Wed, 11 Apr 2012 20:55:34 GMT
Keith, thanks for the quick reply. I see the hierarchy is maintained in
the row id - would you still
recommend this approach for deep hierarchies, on the order of hundreds and
perhaps thousands?

Maybe this is an accumulo newbie question - Is this approach better than
walking the tree?  My current schema is something like

A parentOf:B 1
A parentOf:C 1
B childOf:A 1
C childOf:A 1
And so on…

Thanks,
Ralph


On 4/10/12 3:23 PM, "Keith Turner" <keith@deenlo.com> wrote:

>oooppsss I was not paying close attention,  when it scans level 001 it
>will insert the following
>
>  000/A count = 6 (not 5)
>
>On Tue, Apr 10, 2012 at 6:21 PM, Keith Turner <keith@deenlo.com> wrote:
>> Take a look at the
>> org.apache.accumulo.examples.simple.dirlist.FileCount example,  I
>> think it does what you are asking.
>>
>> The way it works it that assume your tree would be stored in accumulo
>> as follows.
>>
>> 000/A
>> 001/A/B
>> 001/A/C
>> 002/A/B/D
>> 002/A/B/E
>> 002/A/C/F
>> 002/A/C/G
>>
>> The number before the path is the depth, therefore all data for a
>> particular depth of the tree is stored contiguously.  The example
>> program scans each depth, starting with the highest depth, and push
>> counts up.  This very efficient because there is not a lot of random
>> access, it sequentially reads data for each depth and stream updates
>> to the lower depth.
>>
>> For example when reading level 002, it would do the following two
>>inserts :
>>
>>  001/A/B count=2
>>  001/A/C count=2
>>
>> Then it would read level 001 and push the following insert :
>>
>>  000/A count = 5
>>
>> Keith
>>
>> On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
>>wrote:
>>> Hi, I wish to do a recursive rollup-count and am wondering the best
>>>way to
>>> do this.
>>>
>>> What I mean is this ­ in accumulo is a table with data that represents
>>>the
>>> nodes and leafs in a tree.  Each node/leaf in accumulo knows it's
>>>parent
>>> and children and wether it is a node or leaf.  I wish to have a
>>> rollup-count for any given node to know the combined total of all
>>> descendants.
>>>
>>> For example, given the tree:
>>>
>>>                          A
>>>                          |
>>>                       --------
>>>                       |      |
>>>                       B      C
>>>                       |      |
>>>                     -----  -----
>>>                     |   |  |    |
>>>                     D   E  F    G
>>>
>>> "A" would have 6 descendants.
>>>
>>> I can use the SummingCombiner iterator to get a child count, e.g "A"
>>>has 2
>>> children, but I am not sure the best way to recurse down.  The data is
>>> static so I do not necessarily need a dynamic, on-the-fly solution.
>>>
>>> Thanks for your help,
>>> Ralph

Mime
View raw message