accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vaibhav thapliyal <vaibhav.thapliyal...@gmail.com>
Subject Re: BatchScanner taking too much time to scan rows
Date Tue, 12 May 2015 18:39:15 GMT
I also tried to increase threads to a bigger number about 500, but yes I
will try using batchscanner with 194 threads too.  I will get back with the
info that Keith has asked in some time.

Thanks
Vaibhav
On 13-May-2015 12:04 am, "David Medinets" <david.medinets@gmail.com> wrote:

> Try using 194 threads if your hardware can support them. The worst
> that'll happen is the client program crashes during testing. If that
> happens, cut the number of threads in half. And so on.
>
> On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal
> <vaibhav.thapliyal.91@gmail.com> wrote:
> > I have 194 tablets. Currently I am using 20 threads to create the
> > batchscanner inside the createBatchScanner method.
> >
> > On 12-May-2015 11:19 pm, "Keith Turner" <keith@deenlo.com> wrote:
> >>
> >> How many tablets do you have?  The batch scanner does not parallelize
> >> operations within a tablet.
> >>
> >> If you give the batch scanner more threads than there are tservers, it
> >> will make multilple parallel rpc calls to each tserver if the tserver
> has
> >> multiple tablets.  Each rpc may include multiple tablets and ranges for
> each
> >> tablet.
> >>
> >> If the batch scanner has less threads than tservers, it will make one
> rpc
> >> per tserver per thread.  Each rpc call will include all tablets and
> >> associated ranges for that tserver.
> >>
> >> Keith
> >>
> >>
> >>
> >> On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal
> >> <vaibhav.thapliyal.91@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am using BatchScanner to scan rows from a accumulo table. The table
> has
> >>> around 187m entries and I am using a 3 node cluster which has accumulo
> >>> 1.6.1.
> >>>
> >>> I have passed 10000 ids which are stored as row id in my table as a
> list
> >>> in the setRanges() method.
> >>>
> >>> This whole process takes around 50 secs(from adding the ids in the list
> >>> to scanning the whole table using the BatchScanner).
> >>>
> >>> I tried switching on bloom filters but that didn't work.
> >>>
> >>> Also if anyone could briefly explain how a BatchScanner works, how it
> >>> does parallel scanning it would help me understand what I am doing
> better.
> >>>
> >>> Thanks
> >>> Vaibhav
> >>>
> >>>
> >>
> >
>

Mime
View raw message