accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: Accumulo Utilities
Date Thu, 28 Mar 2013 15:55:40 GMT
I took a quick look at the code. Excluding the threading issue, a
major conceptual difference is that BatchScannerWithScanners seems to
do a RPC round trip for each range.   The TabletServerBatchReader
sends all of the ranges that a tablet server needs to lookup in one

Instead of creating a BatchScannerWithScanners, maybe you could create
a batch scanner with just one thread when resources are exceeded?
This will be similar to what you are doing now, just one thread will
be doing work fetching data.  The client thread would just be waiting
on this background thread.   Although this does allow the processing
of result to happen concurrently with fetching of data.  Using
BatchScannerWithScanners would not allow this.

Something to be aware of, the regular scanner will spin up a read
ahead thread if you read a lot of data through it.  It does not do
this immediately, only after fetching a few batches of key value pairs
from the tablet server.  If this happens you could have one thread
fetching data while the client thread processes results.

Do you think we should open a a ticket about giving users control over
threads created by client code?    Maybe users could pass in their own
thread pool to a batch scanner?


On Thu, Mar 28, 2013 at 11:00 AM,  <> wrote:
> In some of my projects, we needed to control the number of threads spun up with the use
of multiple batch scanners. We created a utility to control the number of threads, and if
the max threads has been reached, return a batch scanner that is actually backed by Scanners.
Wanted to get any feedback on the code. Seems like such a simple thing to do, I bet someone
already has this. Thanks!

View raw message