You probably would not want to maintain the full hierarchy in the row
if it was deep. I think we did this in the filesystem example because
we wanted to do one lookup to retrieve a path and not have to follow
it down. However I suspect it would still be more optimal if you
could avoid a random access for each node in the tree. Especially if
the tree is wide and deep.
I am thinking you can do the following, but I have still not proved it
to myself completely. The reason I always keep two levels of the path
is that you want all of the children to sort together. For example at
level 002, all of the children of B will sort together. So you can
scan all of the children at of B at level 2 and when you see something
thats not a child of B you can emit a count for node B to level 1.
001:A/B childOf /A
001:A/C childOf /A
002:B/D childOf A/B
002:B/E childOf A/B
002:C/F childOf A/C
002:C/G childOf A/C
Still do the algorithm I described earlier where you scan starting at
the deepest depth. So when scanning level 002, do the following
inserts.
001:A/B children 2
001:A/C children 2
Thoughts?
Keith
2012/4/11 Perko, Ralph J <Ralph.Perko@pnnl.gov>:
> Keith, thanks for the quick reply. I see the hierarchy is maintained in
> the row id  would you still
> recommend this approach for deep hierarchies, on the order of hundreds and
> perhaps thousands?
>
> Maybe this is an accumulo newbie question  Is this approach better than
> walking the tree? My current schema is something like
>
> A parentOf:B 1
> A parentOf:C 1
> B childOf:A 1
> C childOf:A 1
> And so on…
>
> Thanks,
> Ralph
>
>
> On 4/10/12 3:23 PM, "Keith Turner" <keith@deenlo.com> wrote:
>
>>oooppsss I was not paying close attention, when it scans level 001 it
>>will insert the following
>>
>> 000/A count = 6 (not 5)
>>
>>On Tue, Apr 10, 2012 at 6:21 PM, Keith Turner <keith@deenlo.com> wrote:
>>> Take a look at the
>>> org.apache.accumulo.examples.simple.dirlist.FileCount example, I
>>> think it does what you are asking.
>>>
>>> The way it works it that assume your tree would be stored in accumulo
>>> as follows.
>>>
>>> 000/A
>>> 001/A/B
>>> 001/A/C
>>> 002/A/B/D
>>> 002/A/B/E
>>> 002/A/C/F
>>> 002/A/C/G
>>>
>>> The number before the path is the depth, therefore all data for a
>>> particular depth of the tree is stored contiguously. The example
>>> program scans each depth, starting with the highest depth, and push
>>> counts up. This very efficient because there is not a lot of random
>>> access, it sequentially reads data for each depth and stream updates
>>> to the lower depth.
>>>
>>> For example when reading level 002, it would do the following two
>>>inserts :
>>>
>>> 001/A/B count=2
>>> 001/A/C count=2
>>>
>>> Then it would read level 001 and push the following insert :
>>>
>>> 000/A count = 5
>>>
>>> Keith
>>>
>>> On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
>>>wrote:
>>>> Hi, I wish to do a recursive rollupcount and am wondering the best
>>>>way to
>>>> do this.
>>>>
>>>> What I mean is this in accumulo is a table with data that represents
>>>>the
>>>> nodes and leafs in a tree. Each node/leaf in accumulo knows it's
>>>>parent
>>>> and children and wether it is a node or leaf. I wish to have a
>>>> rollupcount for any given node to know the combined total of all
>>>> descendants.
>>>>
>>>> For example, given the tree:
>>>>
>>>> A
>>>> 
>>>> 
>>>>  
>>>> B C
>>>>  
>>>>  
>>>>    
>>>> D E F G
>>>>
>>>> "A" would have 6 descendants.
>>>>
>>>> I can use the SummingCombiner iterator to get a child count, e.g "A"
>>>>has 2
>>>> children, but I am not sure the best way to recurse down. The data is
>>>> static so I do not necessarily need a dynamic, onthefly solution.
>>>>
>>>> Thanks for your help,
>>>> Ralph
>
