accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wells <awe...@clearedgeit.com>
Subject Re: Iterators adding data: IteratorEnvironment.registerSideChannel?
Date Mon, 16 Feb 2015 04:17:56 GMT
The main issue with adding data in an iterator is order. If you have can do
a merge sort insertion, then you can guarantee order and  its fine. But if
you are inserting base on input you cannot guarantee order, and it can only
be on scan iterator.
 On Feb 15, 2015 8:03 PM, "Dylan Hutchison" <dhutchis@stevens.edu> wrote:

> Hello all,
>
> I've been toying with the registerSideChannel(iter)
> <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/IteratorEnvironment.html#registerSideChannel(org.apache.accumulo.core.iterators.SortedKeyValueIterator)>
method
> on the IteratorEnvironment passed to iterators through the init() method.
> From what I can tell, the method allows you to add another iterator as a
> top level source, to be merged in along with other usual top-level sources
> such as the in-memory cache and RFiles.
>
> Are there any downsides to using registerSideChannel( ) to "add new data"
> to an iterator chain?  It looks like this is fairly stable, so long as the
> iterator we add as a side channel implements seek() properly so as to only
> return entries whose rows are within a tablet.  I imagine it works like so:
>
> Suppose we set a custom iterator InjectIterator that registers a side
> channel inside init() at priority 5 as a one-time major compaction
> iterator.  InjectIterator forwards other operations to its parent, as in
> WrappingIterator
> <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/WrappingIterator.html>.
> We start the compaction:
>
> Tablet 1 (a,g]
>
>    1. init() called on InjectIterator.  Creates the side channel
>    iterator, calls init() on it, and registers it.
>    2. init() called on VersioningIterator.
>    3. init() called on top level iterators, including Rfiles, in-memory
>    cache and the new side channel.
>    4. seek( (a,g] ) called on InjectIterator.
>    5. seek( (a,g] ) called on VersioningIterator.
>    6. seek( (a,g] ) called on top level iterators
>    7. next() called on InjectIterator. Forwards to parent.
>    8. next() called on VersioningIterator. Forwards to parent.
>    9. next() called on top level iterator (a MultiIterator
>    <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/system/MultiIterator.html>).
>    The next value is read from all the top-level iterator sources and the one
>    with the least key is cached ready to go.
>    10. ...
>
> Tablet 2 (g,p)  --- same as tablet 1 except steps 4-6 call seek( (g,p) ).
> Done in parallel with tablet 1 if on a different tablet server.
>
> Is this an accurate depiction?  Anything I should treat with caution?  It
> seems to work on my single-node instance, so tips about difficulties going
> to multi-node are good.
>
> Code available here.
> <https://github.com/Accla/d4m_api_java/blob/0d8c62164d5c0b59f949ce23c1b85536809764d2/src/main/java/edu/mit/ll/graphulo/InjectIterator.java#L166>
>
> Regards,
> Dylan Hutchison
>
> --
> www.cs.stevens.edu/~dhutchis
>

Mime
View raw message