hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurjeet Singh <gurj...@gmail.com>
Subject Re: Slow full-table scans
Date Wed, 22 Aug 2012 16:42:58 GMT
Okay, I just ran extensive tests with my minimal test case and you are
correct, the old and the new version do the scans in about the same
amount of time (although puts are MUCH faster in the packed scheme).

I guess my test case is too minimal. I will try to make a better
testcase since in my production code, there is still a 500x
difference.

Gurjeet

On Tue, Aug 21, 2012 at 10:00 PM, J Mohamed Zahoor <jmozah@gmail.com> wrote:
> Try a quick TestDFSIO to see if things are okay.
>
> ./zahoor
>
> On Wed, Aug 22, 2012 at 6:26 AM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:
>
>> It's possible that there is a bad or slower disk on Gurjeet's machine. I
>> think details of iostat and cpu would clear things up.
>>
>> On Tue, Aug 21, 2012 at 4:33 PM, lars hofhansl <lhofhansl@yahoo.com>
>> wrote:
>>
>> > I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size
>> > 100
>> >
>> >
>> >
>> > ________________________________
>> >  From: Gurjeet Singh <gurjeet@gmail.com>
>> > To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>> > Sent: Tuesday, August 21, 2012 11:31 AM
>> >  Subject: Re: Slow full-table scans
>> >
>> > How does that compare with the newScanTable on your build ?
>> >
>> > Gurjeet
>> >
>> > On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl <lhofhansl@yahoo.com>
>> > wrote:
>> > > Hmm... So I tried in HBase (current trunk).
>> > > I created 100 rows with 200.000 columns each (using your oldMakeTable).
>> > The creation took a bit, but scanning finished in 1.8s. (HBase in pseudo
>> > distributed mode - with your oldScanTable).
>> > >
>> > > -- Lars
>> > >
>> > >
>> > >
>> > > ----- Original Message -----
>> > > From: lars hofhansl <lhofhansl@yahoo.com>
>> > > To: "user@hbase.apache.org" <user@hbase.apache.org>
>> > > Cc:
>> > > Sent: Monday, August 20, 2012 7:50 PM
>> > > Subject: Re: Slow full-table scans
>> > >
>> > > Thanks Gurjeet,
>> > >
>> > > I'll (hopefully) have a look tomorrow.
>> > >
>> > > -- Lars
>> > >
>> > >
>> > >
>> > > ----- Original Message -----
>> > > From: Gurjeet Singh <gurjeet@gmail.com>
>> > > To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>> > > Cc:
>> > > Sent: Monday, August 20, 2012 7:42 PM
>> > > Subject: Re: Slow full-table scans
>> > >
>> > > Hi Lars,
>> > >
>> > > Here is a testcase:
>> > >
>> > > https://gist.github.com/3410948
>> > >
>> > > Benchmarking code:
>> > >
>> > > https://gist.github.com/3410952
>> > >
>> > > Try running it with numRows = 100, numCols = 200000, segmentSize = 1000
>> > >
>> > > Gurjeet
>> > >
>> > >
>> > > On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh <gurjeet@gmail.com>
>> > wrote:
>> > >> Sure - I can create a minimal testcase and send it along.
>> > >>
>> > >> Gurjeet
>> > >>
>> > >> On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <lhofhansl@yahoo.com>
>> > wrote:
>> > >>> That's interesting.
>> > >>> Could you share your old and new schema. I would like to track
down
>> > the performance problems you saw.
>> > >>> (If you had a demo program that populates your rows with 200.000
>> > columns in a way where you saw the performance issues, that'd be even
>> > better, but not necessary).
>> > >>>
>> > >>>
>> > >>> -- Lars
>> > >>>
>> > >>>
>> > >>>
>> > >>> ________________________________
>> > >>>  From: Gurjeet Singh <gurjeet@gmail.com>
>> > >>> To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>> > >>> Sent: Thursday, August 16, 2012 11:26 AM
>> > >>> Subject: Re: Slow full-table scans
>> > >>>
>> > >>> Sorry for the delay guys.
>> > >>>
>> > >>> Here are a few results:
>> > >>>
>> > >>> 1. Regions in the table = 11
>> > >>> 2. The region servers don't appear to be very busy with the query
~5%
>> > >>> CPU (but with parallelization, they are all busy)
>> > >>>
>> > >>> Finally, I changed the format of my data, such that each cell in
>> HBase
>> > >>> contains a chunk of a row instead of the single value it had. So,
>> > >>> stuffing each Hbase cell with 500 columns of a row, gave me a
>> > >>> performance boost of 1000x. It seems that the underlying issue
was IO
>> > >>> overhead per byte of actual data stored.
>> > >>>
>> > >>>
>> > >>> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <lhofhansl@yahoo.com>
>> > wrote:
>> > >>>> Yeah... It looks OK.
>> > >>>> Maybe 2G of heap is a bit low when dealing with 200.000 column
rows.
>> > >>>>
>> > >>>>
>> > >>>> If you can I'd like to know how busy your regionservers are
during
>> > these operations. That would be an indication on whether the
>> > parallelization is good or not.
>> > >>>>
>> > >>>> -- Lars
>> > >>>>
>> > >>>>
>> > >>>> ----- Original Message -----
>> > >>>> From: Stack <stack@duboce.net>
>> > >>>> To: user@hbase.apache.org
>> > >>>> Cc:
>> > >>>> Sent: Wednesday, August 15, 2012 3:13 PM
>> > >>>> Subject: Re: Slow full-table scans
>> > >>>>
>> > >>>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <gurjeet@gmail.com>
>> > wrote:
>> > >>>>> I am beginning to think that this is a configuration issue
on my
>> > >>>>> cluster. Do the following configuration files seem sane
?
>> > >>>>>
>> > >>>>> hbase-env.sh    https://gist.github.com/3345338
>> > >>>>>
>> > >>>>
>> > >>>> Nothing wrong w/ this (Remove the -ea, you don't want asserts
in
>> > >>>> production, and the -XX:+CMSIncrementalMode flag if >= 2
cores).
>> > >>>>
>> > >>>>
>> > >>>>> hbase-site.xml    https://gist.github.com/3345356
>> > >>>>>
>> > >>>>
>> > >>>> This is all defaults effectively.   I don't see any of the
configs.
>> > >>>> recommended by the performance section of the reference guide
and/or
>> > >>>> those suggested by the GBIF blog.
>> > >>>>
>> > >>>> You don't answer LarsH's query about where you see the 4%
>> difference.
>> > >>>>
>> > >>>> How many regions in your table?  Whats the HBase Master UI
look like
>> > >>>> when this scan is running?
>> > >>>> St.Ack
>> > >>>>
>> >
>>

Mime
View raw message