hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: Optimizing table scans
Date Mon, 17 Sep 2012 04:36:29 GMT
>The reason the scan with setBatch(1) is much
much faster is because it returns the only the value for the first column ?

When u set batching=1, it returns all the column values of rows. But one column value at a
time.... FYI

-Anoop-
________________________________________
From: Amit Sela [amits@infolinks.com]
Sent: Saturday, September 15, 2012 2:41 PM
To: user@hbase.apache.org
Subject: Re: Optimizing table scans

So just to get it straight. The reason the scan with setBatch(1) is much
much faster is because it returns the only the value for the first column ?

On Wed, Sep 12, 2012 at 5:37 PM, Doug Meil <doug.meil@explorysmedical.com>wrote:

>
> Hi there,
>
> See this for info on the block cache in the RegionServer..
>
> http://hbase.apache.org/book.html
> 9.6.4. Block Cache
>
> Š and see this for "batching" on the scan parameter...
>
> http://hbase.apache.org/book.html#perf.reading
> 11.8.1. Scan Caching
>
>
>
>
>
>
> On 9/12/12 9:55 AM, "Amit Sela" <amits@infolinks.com> wrote:
>
> >I allocate 10GB per RegionServer.
> >An average row size is ~200 Bytes.
> >The network is 1GB.
> >
> >It would be great if anyone could elaborate on the difference between
> >Cache
> >and Batch parameters.
> >
> >Thanks.
> >
> >On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel
> ><michael_segel@hotmail.com>wrote:
> >
> >> How much memory do you have?
> >> What's the size of the underlying row?
> >> What does your network look like? 1GBe or 10GBe?
> >>
> >> There's more to it, and I think that you'll find that YMMV on what is an
> >> optimum scan size...
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Sep 12, 2012, at 7:57 AM, Amit Sela <amits@infolinks.com> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I'm trying to find the sweet spot for the cache size and batch size
> >> Scan()
> >> > parameters.
> >> >
> >> > I'm scanning one table using HTable.getScanner() and iterating over
> >>the
> >> > ResultScanner retrieved.
> >> >
> >> > I did some testing and got the following results:
> >> >
> >> > For scanning *1000000* rows.
> >> >
> >> > *
> >> >
> >> > Cache
> >> >
> >> > Batch
> >> >
> >> > Total execution time (sec)
> >> >
> >> > 10000
> >> >
> >> > -1 (default)
> >> >
> >> > 112
> >> >
> >> > 10000
> >> >
> >> > 5000
> >> >
> >> > 110
> >> >
> >> > 10000
> >> >
> >> > 10000
> >> >
> >> > 110
> >> >
> >> > 10000
> >> >
> >> > 20000
> >> >
> >> > 110
> >> >
> >> > Cache
> >> >
> >> > Batch
> >> >
> >> > Total execution time (sec)
> >> >
> >> > 1000
> >> >
> >> > -1 (default)
> >> >
> >> > 116
> >> >
> >> > 10000
> >> >
> >> > -1 (default)
> >> >
> >> > 110
> >> >
> >> > 20000
> >> >
> >> > -1 (default)
> >> >
> >> > 115
> >> >
> >> > Cache
> >> >
> >> > Batch
> >> >
> >> > Total execution time (sec)
> >> >
> >> > 5000
> >> >
> >> > 10
> >> >
> >> > 26
> >> >
> >> > 20000
> >> >
> >> > 10
> >> >
> >> > 25
> >> >
> >> > 50000
> >> >
> >> > 10
> >> >
> >> > 26
> >> >
> >> > 5000
> >> >
> >> > 5
> >> >
> >> > 15
> >> >
> >> > 20000
> >> >
> >> > 5
> >> >
> >> > 14
> >> >
> >> > 50000
> >> >
> >> > 5
> >> >
> >> > 14
> >> >
> >> > 1000
> >> >
> >> > 1
> >> >
> >> > 6
> >> >
> >> > 5000
> >> >
> >> > 1
> >> >
> >> > 5
> >> >
> >> > 10000
> >> >
> >> > 1
> >> >
> >> > 4
> >> >
> >> > 20000
> >> >
> >> > 1
> >> >
> >> > 4
> >> >
> >> > 50000
> >> >
> >> > 1
> >> >
> >> > 4
> >> >
> >> > *
> >> > *I don't understand why a lower batch size gives such an improvement
> >>?*
> >> >
> >> > Thanks,
> >> >
> >> > Amit.
> >> > *
> >> > *
> >>
> >>
>
>
>
Mime
View raw message