accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vaibhav thapliyal <vaibhav.thapliyal...@gmail.com>
Subject Re: BatchScanner taking too much time to scan rows
Date Tue, 12 May 2015 17:58:19 GMT
I have 194 tablets. Currently I am using 20 threads to create the
batchscanner inside the createBatchScanner method.
On 12-May-2015 11:19 pm, "Keith Turner" <keith@deenlo.com> wrote:

> How many tablets do you have?  The batch scanner does not parallelize
> operations within a tablet.
>
> If you give the batch scanner more threads than there are tservers, it
> will make multilple parallel rpc calls to each tserver if the tserver has
> multiple tablets.  Each rpc may include multiple tablets and ranges for
> each tablet.
>
> If the batch scanner has less threads than tservers, it will make one rpc
> per tserver per thread.  Each rpc call will include all tablets and
> associated ranges for that tserver.
>
> Keith
>
>
>
> On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal <
> vaibhav.thapliyal.91@gmail.com> wrote:
>
>> Hi,
>>
>> I am using BatchScanner to scan rows from a accumulo table. The table has
>> around 187m entries and I am using a 3 node cluster which has accumulo
>> 1.6.1.
>>
>> I have passed 10000 ids which are stored as row id in my table as a list
>> in the setRanges() method.
>>
>> This whole process takes around 50 secs(from adding the ids in the list
>> to scanning the whole table using the BatchScanner).
>>
>> I tried switching on bloom filters but that didn't work.
>>
>> Also if anyone could briefly explain how a BatchScanner works, how it
>> does parallel scanning it would help me understand what I am doing better.
>>
>> Thanks
>> Vaibhav
>>
>>
>>
>

Mime
View raw message