accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@stevens.edu>
Subject Iterators adding data: IteratorEnvironment.registerSideChannel?
Date Mon, 16 Feb 2015 00:59:54 GMT
Hello all,

I've been toying with the registerSideChannel(iter)
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/IteratorEnvironment.html#registerSideChannel(org.apache.accumulo.core.iterators.SortedKeyValueIterator)>
method
on the IteratorEnvironment passed to iterators through the init() method.
>From what I can tell, the method allows you to add another iterator as a
top level source, to be merged in along with other usual top-level sources
such as the in-memory cache and RFiles.

Are there any downsides to using registerSideChannel( ) to "add new data"
to an iterator chain?  It looks like this is fairly stable, so long as the
iterator we add as a side channel implements seek() properly so as to only
return entries whose rows are within a tablet.  I imagine it works like so:

Suppose we set a custom iterator InjectIterator that registers a side
channel inside init() at priority 5 as a one-time major compaction
iterator.  InjectIterator forwards other operations to its parent, as in
WrappingIterator
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/WrappingIterator.html>.
We start the compaction:

Tablet 1 (a,g]

   1. init() called on InjectIterator.  Creates the side channel iterator,
   calls init() on it, and registers it.
   2. init() called on VersioningIterator.
   3. init() called on top level iterators, including Rfiles, in-memory
   cache and the new side channel.
   4. seek( (a,g] ) called on InjectIterator.
   5. seek( (a,g] ) called on VersioningIterator.
   6. seek( (a,g] ) called on top level iterators
   7. next() called on InjectIterator. Forwards to parent.
   8. next() called on VersioningIterator. Forwards to parent.
   9. next() called on top level iterator (a MultiIterator
   <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/system/MultiIterator.html>).
   The next value is read from all the top-level iterator sources and the one
   with the least key is cached ready to go.
   10. ...

Tablet 2 (g,p)  --- same as tablet 1 except steps 4-6 call seek( (g,p) ).
Done in parallel with tablet 1 if on a different tablet server.

Is this an accurate depiction?  Anything I should treat with caution?  It
seems to work on my single-node instance, so tips about difficulties going
to multi-node are good.

Code available here.
<https://github.com/Accla/d4m_api_java/blob/0d8c62164d5c0b59f949ce23c1b85536809764d2/src/main/java/edu/mit/ll/graphulo/InjectIterator.java#L166>

Regards,
Dylan Hutchison

-- 
www.cs.stevens.edu/~dhutchis

Mime
View raw message