accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <adam.p.fu...@ugov.gov>
Subject Re: recursive rollup
Date Thu, 12 Apr 2012 16:58:26 GMT
Small correction: the branching factor would not have to be exactly 1, but
it would be small on average (close to 1).

Adam


On Thu, Apr 12, 2012 at 12:50 PM, Adam Fuchs <adam.p.fuchs@ugov.gov> wrote:

> This probably won't work, unless all node names are unique at a given
> level. For example, given paths /a/c/d/e and /b/c/d/e you would have two
> conflicting entries for "4:d/e childOf c/d". You might be able to use a
> unique name instead with a similar scheme, but that could possibly
> introduce a bottleneck.
>
> Another thought on this is if you actually have path depths in the
> thousands then there's probably a compression scheme that could save you a
> lot of space and compute time, as the branching factor would necessarily be
> 1 for the majority of those nodes.
>
> Cheers,
> Adam
>
>
> On Wed, Apr 11, 2012 at 10:43 PM, Keith Turner <keith@deenlo.com> wrote:
>
>> There is one other refinement to this table structure, the depth could
>> be sorted in inverse order.  To do this store (MAX_DEPTH - depth).
>> This avoids the binary search the in filesystem example to find the
>> max depth.  So the tree would look like the following, assuming max
>> dept is 999.
>>
>> 997:B/D childOf A/B
>> 997:B/E childOf A/B
>> 997:C/F childOf A/C
>> 997:C/G childOf A/C
>> 998:A/B childOf /A
>> 998:A/C childOf /A
>>
>> Keith
>>
>> 2012/4/11 Perko, Ralph J <Ralph.Perko@pnnl.gov>:
>> > Keith, thanks for the quick reply. I see the hierarchy is maintained in
>> > the row id - would you still
>> > recommend this approach for deep hierarchies, on the order of hundreds
>> and
>> > perhaps thousands?
>> >
>> > Maybe this is an accumulo newbie question - Is this approach better than
>> > walking the tree?  My current schema is something like
>> >
>> > A parentOf:B 1
>> > A parentOf:C 1
>> > B childOf:A 1
>> > C childOf:A 1
>> > And so on…
>> >
>> > Thanks,
>> > Ralph
>> >
>> >
>> > On 4/10/12 3:23 PM, "Keith Turner" <keith@deenlo.com> wrote:
>> >
>> >>oooppsss I was not paying close attention,  when it scans level 001 it
>> >>will insert the following
>> >>
>> >>  000/A count = 6 (not 5)
>> >>
>> >>On Tue, Apr 10, 2012 at 6:21 PM, Keith Turner <keith@deenlo.com> wrote:
>> >>> Take a look at the
>> >>> org.apache.accumulo.examples.simple.dirlist.FileCount example,  I
>> >>> think it does what you are asking.
>> >>>
>> >>> The way it works it that assume your tree would be stored in accumulo
>> >>> as follows.
>> >>>
>> >>> 000/A
>> >>> 001/A/B
>> >>> 001/A/C
>> >>> 002/A/B/D
>> >>> 002/A/B/E
>> >>> 002/A/C/F
>> >>> 002/A/C/G
>> >>>
>> >>> The number before the path is the depth, therefore all data for a
>> >>> particular depth of the tree is stored contiguously.  The example
>> >>> program scans each depth, starting with the highest depth, and push
>> >>> counts up.  This very efficient because there is not a lot of random
>> >>> access, it sequentially reads data for each depth and stream updates
>> >>> to the lower depth.
>> >>>
>> >>> For example when reading level 002, it would do the following two
>> >>>inserts :
>> >>>
>> >>>  001/A/B count=2
>> >>>  001/A/C count=2
>> >>>
>> >>> Then it would read level 001 and push the following insert :
>> >>>
>> >>>  000/A count = 5
>> >>>
>> >>> Keith
>> >>>
>> >>> On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov
>> >
>> >>>wrote:
>> >>>> Hi, I wish to do a recursive rollup-count and am wondering the best
>> >>>>way to
>> >>>> do this.
>> >>>>
>> >>>> What I mean is this ­ in accumulo is a table with data that
>> represents
>> >>>>the
>> >>>> nodes and leafs in a tree.  Each node/leaf in accumulo knows it's
>> >>>>parent
>> >>>> and children and wether it is a node or leaf.  I wish to have a
>> >>>> rollup-count for any given node to know the combined total of all
>> >>>> descendants.
>> >>>>
>> >>>> For example, given the tree:
>> >>>>
>> >>>>                          A
>> >>>>                          |
>> >>>>                       --------
>> >>>>                       |      |
>> >>>>                       B      C
>> >>>>                       |      |
>> >>>>                     -----  -----
>> >>>>                     |   |  |    |
>> >>>>                     D   E  F    G
>> >>>>
>> >>>> "A" would have 6 descendants.
>> >>>>
>> >>>> I can use the SummingCombiner iterator to get a child count, e.g
"A"
>> >>>>has 2
>> >>>> children, but I am not sure the best way to recurse down.  The data
>> is
>> >>>> static so I do not necessarily need a dynamic, on-the-fly solution.
>> >>>>
>> >>>> Thanks for your help,
>> >>>> Ralph
>> >
>>
>
>

Mime
View raw message