hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Slow full-table scans
Date Tue, 21 Aug 2012 23:33:20 GMT
I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size 100



________________________________
 From: Gurjeet Singh <gurjeet@gmail.com>
To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com> 
Sent: Tuesday, August 21, 2012 11:31 AM
Subject: Re: Slow full-table scans
 
How does that compare with the newScanTable on your build ?

Gurjeet

On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> Hmm... So I tried in HBase (current trunk).
> I created 100 rows with 200.000 columns each (using your oldMakeTable). The creation
took a bit, but scanning finished in 1.8s. (HBase in pseudo distributed mode - with your oldScanTable).
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: lars hofhansl <lhofhansl@yahoo.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Cc:
> Sent: Monday, August 20, 2012 7:50 PM
> Subject: Re: Slow full-table scans
>
> Thanks Gurjeet,
>
> I'll (hopefully) have a look tomorrow.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Gurjeet Singh <gurjeet@gmail.com>
> To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
> Cc:
> Sent: Monday, August 20, 2012 7:42 PM
> Subject: Re: Slow full-table scans
>
> Hi Lars,
>
> Here is a testcase:
>
> https://gist.github.com/3410948
>
> Benchmarking code:
>
> https://gist.github.com/3410952
>
> Try running it with numRows = 100, numCols = 200000, segmentSize = 1000
>
> Gurjeet
>
>
> On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh <gurjeet@gmail.com> wrote:
>> Sure - I can create a minimal testcase and send it along.
>>
>> Gurjeet
>>
>> On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>>> That's interesting.
>>> Could you share your old and new schema. I would like to track down the performance
problems you saw.
>>> (If you had a demo program that populates your rows with 200.000 columns in a
way where you saw the performance issues, that'd be even better, but not necessary).
>>>
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ________________________________
>>>  From: Gurjeet Singh <gurjeet@gmail.com>
>>> To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>>> Sent: Thursday, August 16, 2012 11:26 AM
>>> Subject: Re: Slow full-table scans
>>>
>>> Sorry for the delay guys.
>>>
>>> Here are a few results:
>>>
>>> 1. Regions in the table = 11
>>> 2. The region servers don't appear to be very busy with the query ~5%
>>> CPU (but with parallelization, they are all busy)
>>>
>>> Finally, I changed the format of my data, such that each cell in HBase
>>> contains a chunk of a row instead of the single value it had. So,
>>> stuffing each Hbase cell with 500 columns of a row, gave me a
>>> performance boost of 1000x. It seems that the underlying issue was IO
>>> overhead per byte of actual data stored.
>>>
>>>
>>> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>>>> Yeah... It looks OK.
>>>> Maybe 2G of heap is a bit low when dealing with 200.000 column rows.
>>>>
>>>>
>>>> If you can I'd like to know how busy your regionservers are during these
operations. That would be an indication on whether the parallelization is good or not.
>>>>
>>>> -- Lars
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: Stack <stack@duboce.net>
>>>> To: user@hbase.apache.org
>>>> Cc:
>>>> Sent: Wednesday, August 15, 2012 3:13 PM
>>>> Subject: Re: Slow full-table scans
>>>>
>>>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <gurjeet@gmail.com>
wrote:
>>>>> I am beginning to think that this is a configuration issue on my
>>>>> cluster. Do the following configuration files seem sane ?
>>>>>
>>>>> hbase-env.sh    https://gist.github.com/3345338
>>>>>
>>>>
>>>> Nothing wrong w/ this (Remove the -ea, you don't want asserts in
>>>> production, and the -XX:+CMSIncrementalMode flag if >= 2 cores).
>>>>
>>>>
>>>>> hbase-site.xml    https://gist.github.com/3345356
>>>>>
>>>>
>>>> This is all defaults effectively.   I don't see any of the configs.
>>>> recommended by the performance section of the reference guide and/or
>>>> those suggested by the GBIF blog.
>>>>
>>>> You don't answer LarsH's query about where you see the 4% difference.
>>>>
>>>> How many regions in your table?  Whats the HBase Master UI look like
>>>> when this scan is running?
>>>> St.Ack
>>>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message