accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ameet kini <>
Subject Re: number of query threads for batch scanner
Date Tue, 25 Sep 2012 19:17:05 GMT
Thanks William.

The issue here is that without knowing how the numQueryThreads translates
to the number of concurrent scans, I cannot effectively tune that parameter
to maximize resource usage on the tablet server. What I'm seeing is that
even though there are four tablets on the tablet server, my number of
concurrent scans never exceeds 3. This is despite setting numQueryThreads
to a very high number and having 8 cores on the tablet server. I suspect
with 3 concurrent scans and no garbage collection happening at that moment,
most of the cores are sitting idle.


On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <> wrote:

> It should really be dependent upon the resources available to the client.
> You can set an arbitrarily high number of threads, but you're still bound
> by the number of parallel operations the CPU can make. I would assume the
> sweet spot is somewhere around that number-- try doing a small bench mark
> with 2, 4, 8, 16, etc threads and see where your performance starts to
> level off.
> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <> wrote:
>> Probably worth adding that the table mentioned below has a bunch of
>> tablets on other tablet servers as well, which is why I'm using
>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>> number of a concurrent scans on a given tablet server.
>> Thanks
>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <> wrote:
>>> I have a table with 4 tablets on a given tablet server. Depending on the
>>> numQueryThreads parameter below, I see a varying number of maximum
>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>> on what is the relationship between numQueryThreads and number of
>>> concurrent scans?
>>> public BatchScanner createBatchScanner(String tableName,
>>>                                        Authorizations authorizations,
>>>                                        int numQueryThreads)
>>> A follow-on question would be what is general rule of thumb for setting
>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>>> be consumed by that BatchScanner? Should it be the # of tablet servers
>>> expected to be hit by that BatchScanner? Something else?
>>> Thanks,
>>> Ameet

View raw message