accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Write to table from Accumulo iterator
Date Sun, 27 Apr 2014 16:23:38 GMT
Inlined for clarity

On 4/26/14, 11:05 PM, BlackJack76 wrote:
> Thanks again Josh.
> The way I have been approaching it is to create/use/close the BatchWriter
> inside of the seek method when I need it.  Do you see any issues with this
> approach?

It's not terrible, but you will be incurring some extra overhead in this 
approach. The batchwriter is most efficient when you can keep a single 
instance open and just throw many mutations at it. Just make sure to 
close the batchwriter in a finally block, and you shouldn't have any 

> Call me naive but why don't you know when Accumulo is going to tear down
> your iterator and stop using it?  When I attach an iterator to a scanner,
> isn't it only destroyed after I complete my scan?

You don't know because the SKVI API currently doesn't have any means to 
tell you. Yes, the tabletserver knows when it's about to, but you don't 
have means to be told this. This gets trickier with some of the work 
that Accumulo is doing under the hoods that I hinted at previously.

Accumulo maintains a buffer between your (Batch)Scanner and the 
tserver(s) it communicates to. For a number of reasons, when that buffer 
fills up, Accumulo notes the last Key that scan returned, tears down 
your session, and (assuming the client is still there requesting more 
data), will then re-queue your scan to fetch more data starting back at 
where you left off.

For example, if you have a table where each row is a letter in the 
alphabet, and you want to scan over all rows, you would just pass some 
range like (-inf, +inf). Suppose that after you return the letter 'f', 
that buffer fills up, and your scan gets torn down.

Accumulo will restart your scan again with a different range than what 
you previously passed in: (f, +inf). This is an important note if you 
start doing "advanced topics" inside iterators that manipulate the Keys 
being returned, however it is relatively easy to work with.

> What I have observed is something similar to the following....
> init is called on creation
> seek is called where you need to have the first K,V pair at the end of seek
> hasTop, getTopKey, and getTopValue are called
> next is called as long as hasTop is true
> Once hasTop is false, the scan concludes
> --
> View this message in context:
> Sent from the Users mailing list archive at

View raw message