accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: BatchScanner taking too much time to scan rows
Date Tue, 12 May 2015 19:59:43 GMT
On the monitor page, you should see how many threads are running in
each tserver, if I remember correctly. There are also graphs to show
response rates.

On Tue, May 12, 2015 at 2:39 PM, vaibhav thapliyal
<vaibhav.thapliyal.91@gmail.com> wrote:
> I also tried to increase threads to a bigger number about 500, but yes I
> will try using batchscanner with 194 threads too.  I will get back with the
> info that Keith has asked in some time.
>
> Thanks
> Vaibhav
>
> On 13-May-2015 12:04 am, "David Medinets" <david.medinets@gmail.com> wrote:
>>
>> Try using 194 threads if your hardware can support them. The worst
>> that'll happen is the client program crashes during testing. If that
>> happens, cut the number of threads in half. And so on.
>>
>> On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal
>> <vaibhav.thapliyal.91@gmail.com> wrote:
>> > I have 194 tablets. Currently I am using 20 threads to create the
>> > batchscanner inside the createBatchScanner method.
>> >
>> > On 12-May-2015 11:19 pm, "Keith Turner" <keith@deenlo.com> wrote:
>> >>
>> >> How many tablets do you have?  The batch scanner does not parallelize
>> >> operations within a tablet.
>> >>
>> >> If you give the batch scanner more threads than there are tservers, it
>> >> will make multilple parallel rpc calls to each tserver if the tserver
>> >> has
>> >> multiple tablets.  Each rpc may include multiple tablets and ranges for
>> >> each
>> >> tablet.
>> >>
>> >> If the batch scanner has less threads than tservers, it will make one
>> >> rpc
>> >> per tserver per thread.  Each rpc call will include all tablets and
>> >> associated ranges for that tserver.
>> >>
>> >> Keith
>> >>
>> >>
>> >>
>> >> On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal
>> >> <vaibhav.thapliyal.91@gmail.com> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am using BatchScanner to scan rows from a accumulo table. The table
>> >>> has
>> >>> around 187m entries and I am using a 3 node cluster which has accumulo
>> >>> 1.6.1.
>> >>>
>> >>> I have passed 10000 ids which are stored as row id in my table as a
>> >>> list
>> >>> in the setRanges() method.
>> >>>
>> >>> This whole process takes around 50 secs(from adding the ids in the
>> >>> list
>> >>> to scanning the whole table using the BatchScanner).
>> >>>
>> >>> I tried switching on bloom filters but that didn't work.
>> >>>
>> >>> Also if anyone could briefly explain how a BatchScanner works, how it
>> >>> does parallel scanning it would help me understand what I am doing
>> >>> better.
>> >>>
>> >>> Thanks
>> >>> Vaibhav
>> >>>
>> >>>
>> >>
>> >

Mime
View raw message