This probably won't work, unless all node names are unique at a given
level. For example, given paths /a/c/d/e and /b/c/d/e you would have two
conflicting entries for "4:d/e childOf c/d". You might be able to use a
unique name instead with a similar scheme, but that could possibly
introduce a bottleneck.
Another thought on this is if you actually have path depths in the
thousands then there's probably a compression scheme that could save you a
lot of space and compute time, as the branching factor would necessarily be
1 for the majority of those nodes.
Cheers,
Adam
On Wed, Apr 11, 2012 at 10:43 PM, Keith Turner <keith@deenlo.com> wrote:
> There is one other refinement to this table structure, the depth could
> be sorted in inverse order. To do this store (MAX_DEPTH  depth).
> This avoids the binary search the in filesystem example to find the
> max depth. So the tree would look like the following, assuming max
> dept is 999.
>
> 997:B/D childOf A/B
> 997:B/E childOf A/B
> 997:C/F childOf A/C
> 997:C/G childOf A/C
> 998:A/B childOf /A
> 998:A/C childOf /A
>
> Keith
>
> 2012/4/11 Perko, Ralph J <Ralph.Perko@pnnl.gov>:
> > Keith, thanks for the quick reply. I see the hierarchy is maintained in
> > the row id  would you still
> > recommend this approach for deep hierarchies, on the order of hundreds
> and
> > perhaps thousands?
> >
> > Maybe this is an accumulo newbie question  Is this approach better than
> > walking the tree? My current schema is something like
> >
> > A parentOf:B 1
> > A parentOf:C 1
> > B childOf:A 1
> > C childOf:A 1
> > And so on…
> >
> > Thanks,
> > Ralph
> >
> >
> > On 4/10/12 3:23 PM, "Keith Turner" <keith@deenlo.com> wrote:
> >
> >>oooppsss I was not paying close attention, when it scans level 001 it
> >>will insert the following
> >>
> >> 000/A count = 6 (not 5)
> >>
> >>On Tue, Apr 10, 2012 at 6:21 PM, Keith Turner <keith@deenlo.com> wrote:
> >>> Take a look at the
> >>> org.apache.accumulo.examples.simple.dirlist.FileCount example, I
> >>> think it does what you are asking.
> >>>
> >>> The way it works it that assume your tree would be stored in accumulo
> >>> as follows.
> >>>
> >>> 000/A
> >>> 001/A/B
> >>> 001/A/C
> >>> 002/A/B/D
> >>> 002/A/B/E
> >>> 002/A/C/F
> >>> 002/A/C/G
> >>>
> >>> The number before the path is the depth, therefore all data for a
> >>> particular depth of the tree is stored contiguously. The example
> >>> program scans each depth, starting with the highest depth, and push
> >>> counts up. This very efficient because there is not a lot of random
> >>> access, it sequentially reads data for each depth and stream updates
> >>> to the lower depth.
> >>>
> >>> For example when reading level 002, it would do the following two
> >>>inserts :
> >>>
> >>> 001/A/B count=2
> >>> 001/A/C count=2
> >>>
> >>> Then it would read level 001 and push the following insert :
> >>>
> >>> 000/A count = 5
> >>>
> >>> Keith
> >>>
> >>> On Tue, Apr 10, 2012 at 6:02 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
> >>>wrote:
> >>>> Hi, I wish to do a recursive rollupcount and am wondering the best
> >>>>way to
> >>>> do this.
> >>>>
> >>>> What I mean is this in accumulo is a table with data that represents
> >>>>the
> >>>> nodes and leafs in a tree. Each node/leaf in accumulo knows it's
> >>>>parent
> >>>> and children and wether it is a node or leaf. I wish to have a
> >>>> rollupcount for any given node to know the combined total of all
> >>>> descendants.
> >>>>
> >>>> For example, given the tree:
> >>>>
> >>>> A
> >>>> 
> >>>> 
> >>>>  
> >>>> B C
> >>>>  
> >>>>  
> >>>>    
> >>>> D E F G
> >>>>
> >>>> "A" would have 6 descendants.
> >>>>
> >>>> I can use the SummingCombiner iterator to get a child count, e.g "A"
> >>>>has 2
> >>>> children, but I am not sure the best way to recurse down. The data
is
> >>>> static so I do not necessarily need a dynamic, onthefly solution.
> >>>>
> >>>> Thanks for your help,
> >>>> Ralph
> >
>
