accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Iterators adding data: IteratorEnvironment.registerSideChannel?
Date Mon, 16 Feb 2015 00:59:54 GMT
Hello all,

I've been toying with the registerSideChannel(iter)
on the IteratorEnvironment passed to iterators through the init() method.
>From what I can tell, the method allows you to add another iterator as a
top level source, to be merged in along with other usual top-level sources
such as the in-memory cache and RFiles.

Are there any downsides to using registerSideChannel( ) to "add new data"
to an iterator chain?  It looks like this is fairly stable, so long as the
iterator we add as a side channel implements seek() properly so as to only
return entries whose rows are within a tablet.  I imagine it works like so:

Suppose we set a custom iterator InjectIterator that registers a side
channel inside init() at priority 5 as a one-time major compaction
iterator.  InjectIterator forwards other operations to its parent, as in
We start the compaction:

Tablet 1 (a,g]

   1. init() called on InjectIterator.  Creates the side channel iterator,
   calls init() on it, and registers it.
   2. init() called on VersioningIterator.
   3. init() called on top level iterators, including Rfiles, in-memory
   cache and the new side channel.
   4. seek( (a,g] ) called on InjectIterator.
   5. seek( (a,g] ) called on VersioningIterator.
   6. seek( (a,g] ) called on top level iterators
   7. next() called on InjectIterator. Forwards to parent.
   8. next() called on VersioningIterator. Forwards to parent.
   9. next() called on top level iterator (a MultiIterator
   The next value is read from all the top-level iterator sources and the one
   with the least key is cached ready to go.
   10. ...

Tablet 2 (g,p)  --- same as tablet 1 except steps 4-6 call seek( (g,p) ).
Done in parallel with tablet 1 if on a different tablet server.

Is this an accurate depiction?  Anything I should treat with caution?  It
seems to work on my single-node instance, so tips about difficulties going
to multi-node are good.

Code available here.

Dylan Hutchison


View raw message