hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Slow full-table scans
Date Thu, 16 Aug 2012 18:36:52 GMT
That's interesting.
Could you share your old and new schema. I would like to track down the performance problems
you saw.
(If you had a demo program that populates your rows with 200.000 columns in a way where you
saw the performance issues, that'd be even better, but not necessary).

-- Lars

 From: Gurjeet Singh <gurjeet@gmail.com>
To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com> 
Sent: Thursday, August 16, 2012 11:26 AM
Subject: Re: Slow full-table scans
Sorry for the delay guys.

Here are a few results:

1. Regions in the table = 11
2. The region servers don't appear to be very busy with the query ~5%
CPU (but with parallelization, they are all busy)

Finally, I changed the format of my data, such that each cell in HBase
contains a chunk of a row instead of the single value it had. So,
stuffing each Hbase cell with 500 columns of a row, gave me a
performance boost of 1000x. It seems that the underlying issue was IO
overhead per byte of actual data stored.

On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> Yeah... It looks OK.
> Maybe 2G of heap is a bit low when dealing with 200.000 column rows.
> If you can I'd like to know how busy your regionservers are during these operations.
That would be an indication on whether the parallelization is good or not.
> -- Lars
> ----- Original Message -----
> From: Stack <stack@duboce.net>
> To: user@hbase.apache.org
> Cc:
> Sent: Wednesday, August 15, 2012 3:13 PM
> Subject: Re: Slow full-table scans
> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <gurjeet@gmail.com> wrote:
>> I am beginning to think that this is a configuration issue on my
>> cluster. Do the following configuration files seem sane ?
>> hbase-env.sh    https://gist.github.com/3345338
> Nothing wrong w/ this (Remove the -ea, you don't want asserts in
> production, and the -XX:+CMSIncrementalMode flag if >= 2 cores).
>> hbase-site.xml    https://gist.github.com/3345356
> This is all defaults effectively.   I don't see any of the configs.
> recommended by the performance section of the reference guide and/or
> those suggested by the GBIF blog.
> You don't answer LarsH's query about where you see the 4% difference.
> How many regions in your table?  Whats the HBase Master UI look like
> when this scan is running?
> St.Ack
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message