hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Optimizing table scans
Date Wed, 12 Sep 2012 13:04:26 GMT
How much memory do you have? 
What's the size of the underlying row? 
What does your network look like? 1GBe or 10GBe?

There's more to it, and I think that you'll find that YMMV on what is an optimum scan size...

HTH

-Mike

On Sep 12, 2012, at 7:57 AM, Amit Sela <amits@infolinks.com> wrote:

> Hi all,
> 
> I'm trying to find the sweet spot for the cache size and batch size Scan()
> parameters.
> 
> I'm scanning one table using HTable.getScanner() and iterating over the
> ResultScanner retrieved.
> 
> I did some testing and got the following results:
> 
> For scanning *1000000* rows.
> 
> *
> 
> Cache
> 
> Batch
> 
> Total execution time (sec)
> 
> 10000
> 
> -1 (default)
> 
> 112
> 
> 10000
> 
> 5000
> 
> 110
> 
> 10000
> 
> 10000
> 
> 110
> 
> 10000
> 
> 20000
> 
> 110
> 
> Cache
> 
> Batch
> 
> Total execution time (sec)
> 
> 1000
> 
> -1 (default)
> 
> 116
> 
> 10000
> 
> -1 (default)
> 
> 110
> 
> 20000
> 
> -1 (default)
> 
> 115
> 
> Cache
> 
> Batch
> 
> Total execution time (sec)
> 
> 5000
> 
> 10
> 
> 26
> 
> 20000
> 
> 10
> 
> 25
> 
> 50000
> 
> 10
> 
> 26
> 
> 5000
> 
> 5
> 
> 15
> 
> 20000
> 
> 5
> 
> 14
> 
> 50000
> 
> 5
> 
> 14
> 
> 1000
> 
> 1
> 
> 6
> 
> 5000
> 
> 1
> 
> 5
> 
> 10000
> 
> 1
> 
> 4
> 
> 20000
> 
> 1
> 
> 4
> 
> 50000
> 
> 1
> 
> 4
> 
> *
> *I don't understand why a lower batch size gives such an improvement  ?*
> 
> Thanks,
> 
> Amit.
> *
> *


Mime
View raw message