accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: tserver side parallelism
Date Fri, 07 Feb 2014 20:32:37 GMT
The tserver.readahead.concurrent.max property provides an upper-bound on 
the number of scans that will start "reading ahead". This read-ahead is 
a performance tweak that tries to smooth the I/O cost of reading from 
files. However, each readahead thread does increase the amount of heap 
used as the data that was read is stored in memory. This parameter lets 
you provide a maximum amount of space that will be used by readahead 
across *all* scan tasks (from a Scanner, BatchScanner or even major 
compactions) for a tablet server.

The property provides you with control over 
the upper-bound of the number of files for scanning that a tablet server 
(across all tablets hosted by that tablet server) can open. Again, as 
holding these files open, this parameter is meant to allow you to place 
an upper bound on the memory consumption used by opening files.

Now, the number of threads that a batchscanner uses is what's primarily 
going to control your "server side parallelism". When you provide a 
value of N to the batchscanner "threads", you will get up to N "scan 
tasks" running concurrently against your Accumulo instance. The two 
previously described properties will only act to limit the number of 
resources that your single batchscanner (in the view of all active 
batchscanners) can consume.

In situations with multiple clients reading from an Accumulo instance, 
you may run into cases where a scan task (one thread from your 
BatchScanner) is blocked until the tabletserver finishes a previous read 
and thus frees additional resources (number of open files or readahead 
threads) to satisfy your scan request.

Hope that helps.

On 2/7/14, 3:19 PM, Anthony F wrote:
> How do the config variables tserver.readahead.concurrent.max and
> interact with BatchScanner threads requested
> from the Connector?  I have tserver.readahead.concurrent.max set to 64
> and set to 100.  However, unless I bump up
> the number of BatchScanner threads, I don't see much tserver side
> parallelism.  If I bump up the number of BatchScanner threads, then I
> can see multiple scans per tserver.  What governs the number of tserver
> side threads used to execute a scan and what prevents too many threads
> from spinning up to service multiple concurrent scans from independent
> clients?

View raw message