accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: BatchScanner taking too much time to scan rows
Date Tue, 12 May 2015 17:49:05 GMT
How many tablets do you have?  The batch scanner does not parallelize
operations within a tablet.

If you give the batch scanner more threads than there are tservers, it will
make multilple parallel rpc calls to each tserver if the tserver has
multiple tablets.  Each rpc may include multiple tablets and ranges for
each tablet.

If the batch scanner has less threads than tservers, it will make one rpc
per tserver per thread.  Each rpc call will include all tablets and
associated ranges for that tserver.

Keith



On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal <
vaibhav.thapliyal.91@gmail.com> wrote:

> Hi,
>
> I am using BatchScanner to scan rows from a accumulo table. The table has
> around 187m entries and I am using a 3 node cluster which has accumulo
> 1.6.1.
>
> I have passed 10000 ids which are stored as row id in my table as a list
> in the setRanges() method.
>
> This whole process takes around 50 secs(from adding the ids in the list to
> scanning the whole table using the BatchScanner).
>
> I tried switching on bloom filters but that didn't work.
>
> Also if anyone could briefly explain how a BatchScanner works, how it does
> parallel scanning it would help me understand what I am doing better.
>
> Thanks
> Vaibhav
>
>
>

Mime
View raw message