accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slater, David M." <>
Subject RE: Batchscanner and Tablet Memory
Date Thu, 21 Mar 2013 22:01:17 GMT
Awesome, thank you!

-----Original Message-----
From: Keith Turner [] 
Sent: Thursday, March 21, 2013 4:39 PM
Subject: Re: Batchscanner and Tablet Memory

On Thu, Mar 21, 2013 at 4:02 PM, Slater, David M.
<> wrote:
> Thanks Keith, that was very helpful.
> As for your comment "Multiple threads can scan a tablet concurrently", is there any way
to force a BatchScanner to run at most one thread on a tablet, or to have it give the entire
tablet range [a, c) to an iterator instead of breaking it up into [a, b) and [b, c) for different
iterators on the same tablet?

A batch scanner will not use more than one thread to scan an
individual tablet.   I was just responding to your question asking if
multiple threads can scan a tablet.   If there are multiple scanners
and batch scanner, then you could have multiple threads scanning a tablet.

> If it is not designed to operate that way, are there methods in TabletServerBatchReader
that would make sense to extend in order to add that functionality?
> Best regards,
> David
> -----Original Message-----
> From: Keith Turner []
> Sent: Friday, March 15, 2013 3:24 PM
> To:
> Subject: Re: Batchscanner and Tablet Memory
> On Fri, Mar 15, 2013 at 3:08 PM, Slater, David M.
> <> wrote:
>> Hi again,
>> I am curious as to how Accumulo handles multiple threads in a 
>> Batchscanner, and what its ramifications are for memory use on a node.
>> Let's say I start a Batchscanner with 20 threads, and scan across the 
>> entire range of rows in a table of 80 tablets, spread across 4 nodes.
>> Will the Batchscanner try to spin off 20 threads if possible, or will 
>> it try to match it to the number of nodes? Should I try to match the 
>> number of threads with the number of cores that will be working on the data?
> When the batch scanner has more threads than nodes, it will run
> multiple scans on each node.   It will only do this for nodes where it
> has multiple tablets to scan.   So in your example I think it may run
> 20/4=5 scans on each node.  Each scan would access 80/20=4 tablets.
>> When a thread is spun off, my thinking is that the tablet that the 
>> thread is spun off on will move the entire tablet to memory, and then 
>> the tablet will be iterated through. Is this how it typically happens 
>> (or is there possibly multiple threads on the same tablet)? If so, do 
>> I have to worry about memory issues if, say, one of the nodes tries 
>> to move 10 tablets into memory, but doesn't have 20 GB of RAM left to store it?
> Entire tablets are not loaded into memory when you scan a tablet.
> Tablets are composed of rfiles.  RFiles are composed of blocks of key values.  So only
a few of these key/blocks from rfiles are loaded at any given time.  It possible that these
RFile blocks may be cached in the tablet server process depending on your configuration.
> Multiple threads can scan a tablet concurrently.
>> Sorry for the vagueness of the questions, but I'm trying to 
>> understand how the general process works under the covers, in order 
>> to diagnose some performance issues I have been having.
>> Thanks,
>> David

View raw message