accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: number of query threads for batch scanner
Date Fri, 28 Sep 2012 12:04:31 GMT
On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <> wrote:
> Thanks William.
> The issue here is that without knowing how the numQueryThreads translates to
> the number of concurrent scans, I cannot effectively tune that parameter to
> maximize resource usage on the tablet server. What I'm seeing is that even
> though there are four tablets on the tablet server, my number of concurrent
> scans never exceeds 3. This is despite setting numQueryThreads to a very
> high number and having 8 cores on the tablet server. I suspect with 3
> concurrent scans and no garbage collection happening at that moment, most of
> the cores are sitting idle.
> Ameet

The amount if parallelism is determined by how your ranges map to
tablets. Below are some examples.

 * For one range that maps to 10 tablets on 10 tablets severs, it will
execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 5 concurrent scans if numQueryThreads is 5.
 * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.

If you have more query threads than tablet server, the client code
will try to execute concurrent scans on a single tablet server.

You can look at TabletServerBatchReaderIterator.doLookups() for the
details.  In this method it creates QueryTask objects and places them
on a thread pool.  The size of the thread pool is the user specified

> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
> <> wrote:
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound by
>> the number of parallel operations the CPU can make. I would assume the sweet
>> spot is somewhere around that number-- try doing a small bench mark with 2,
>> 4, 8, 16, etc threads and see where your performance starts to level off.
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <> wrote:
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>> Thanks
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <> wrote:
>>>> I have a table with 4 tablets on a given tablet server. Depending on the
>>>> numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of concurrent
>>>> scans?
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>>>> consumed by that BatchScanner? Should it be the # of tablet servers expected
>>>> to be hit by that BatchScanner? Something else?
>>>> Thanks,
>>>> Ameet

View raw message