hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Haidinyak <phaidin...@local.com>
Subject RE: Make it quicker
Date Tue, 07 Dec 2010 01:06:25 GMT
Thanks, I will use these results as a baseline and see what I can do to tweak them.


-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, December 06, 2010 5:01 PM
To: user@hbase.apache.org
Subject: Re: Make it quicker

The speed would really depend on the size of the rows, which is all
the values plus all the keys (row, family, qualifier, timestamp) for
each of those values. For example, if your rows are a total of 500
bytes each, you have to pull about 300MB which means that the
throughput would be 33MB/s, which is good considering you're going
through the network for non-local data and that it requires multiple
RPCs to fetch all that data... but that's just an example.

Usual optimizations:

 - use scanner caching
 - use LZO
 - only retrieve the columns you need
 - use the smallest keys possible

Hope that helps,


On Mon, Dec 6, 2010 at 2:02 PM, Peter Haidinyak <phaidinyak@local.com> wrote:
> Hi y'all,
>  Ok, I put about 2.5 million rows into HBase that is running on three machines (2 region
servers and 1 name node, etc). The row id is the date plus a number that increments. ('20101201|0000001').
From a java client I do a scan with the starting row and ending row for one days logs (the
last 627k rows in HBase).
>                Right now the scan runs in about 9 seconds to process 627k rows.
For commodity servers is the about normal? Also, where can I learn how to optimize this process?
> Thanks again.
> -Pete

View raw message