hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: Optimizing table scans
Date Wed, 12 Sep 2012 13:55:20 GMT
I allocate 10GB per RegionServer.
An average row size is ~200 Bytes.
The network is 1GB.

It would be great if anyone could elaborate on the difference between Cache
and Batch parameters.

Thanks.

On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> How much memory do you have?
> What's the size of the underlying row?
> What does your network look like? 1GBe or 10GBe?
>
> There's more to it, and I think that you'll find that YMMV on what is an
> optimum scan size...
>
> HTH
>
> -Mike
>
> On Sep 12, 2012, at 7:57 AM, Amit Sela <amits@infolinks.com> wrote:
>
> > Hi all,
> >
> > I'm trying to find the sweet spot for the cache size and batch size
> Scan()
> > parameters.
> >
> > I'm scanning one table using HTable.getScanner() and iterating over the
> > ResultScanner retrieved.
> >
> > I did some testing and got the following results:
> >
> > For scanning *1000000* rows.
> >
> > *
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 10000
> >
> > -1 (default)
> >
> > 112
> >
> > 10000
> >
> > 5000
> >
> > 110
> >
> > 10000
> >
> > 10000
> >
> > 110
> >
> > 10000
> >
> > 20000
> >
> > 110
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 1000
> >
> > -1 (default)
> >
> > 116
> >
> > 10000
> >
> > -1 (default)
> >
> > 110
> >
> > 20000
> >
> > -1 (default)
> >
> > 115
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 5000
> >
> > 10
> >
> > 26
> >
> > 20000
> >
> > 10
> >
> > 25
> >
> > 50000
> >
> > 10
> >
> > 26
> >
> > 5000
> >
> > 5
> >
> > 15
> >
> > 20000
> >
> > 5
> >
> > 14
> >
> > 50000
> >
> > 5
> >
> > 14
> >
> > 1000
> >
> > 1
> >
> > 6
> >
> > 5000
> >
> > 1
> >
> > 5
> >
> > 10000
> >
> > 1
> >
> > 4
> >
> > 20000
> >
> > 1
> >
> > 4
> >
> > 50000
> >
> > 1
> >
> > 4
> >
> > *
> > *I don't understand why a lower batch size gives such an improvement  ?*
> >
> > Thanks,
> >
> > Amit.
> > *
> > *
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message