accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: recursive rollup
Date Tue, 10 Apr 2012 22:21:53 GMT
Take a look at the
org.apache.accumulo.examples.simple.dirlist.FileCount example,  I
think it does what you are asking.

The way it works it that assume your tree would be stored in accumulo
as follows.


The number before the path is the depth, therefore all data for a
particular depth of the tree is stored contiguously.  The example
program scans each depth, starting with the highest depth, and push
counts up.  This very efficient because there is not a lot of random
access, it sequentially reads data for each depth and stream updates
to the lower depth.

For example when reading level 002, it would do the following two inserts :

  001/A/B count=2
  001/A/C count=2

Then it would read level 001 and push the following insert :

  000/A count = 5


On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <> wrote:
> Hi, I wish to do a recursive rollup-count and am wondering the best way to
> do this.
> What I mean is this ­ in accumulo is a table with data that represents the
> nodes and leafs in a tree.  Each node/leaf in accumulo knows it's parent
> and children and wether it is a node or leaf.  I wish to have a
> rollup-count for any given node to know the combined total of all
> descendants.
> For example, given the tree:
>                          A
>                          |
>                       --------
>                       |      |
>                       B      C
>                       |      |
>                     -----  -----
>                     |   |  |    |
>                     D   E  F    G
> "A" would have 6 descendants.
> I can use the SummingCombiner iterator to get a child count, e.g "A" has 2
> children, but I am not sure the best way to recurse down.  The data is
> static so I do not necessarily need a dynamic, on-the-fly solution.
> Thanks for your help,
> Ralph

View raw message