hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin O'dell" <kevin.od...@cloudera.com>
Subject Re: more regionservers does not improve performance
Date Fri, 12 Oct 2012 13:44:17 GMT
Jonathan,

  Lets take a deeper look here.

What is your memstore set at for the table/CF in question?  Lets compare
that value with the flush size you are seeing for your regions.  If they
are really small flushes is it all to the same region?  If so that is going
to be schema issues.  If they are full flushes you can up your memstore
assuming you have the heap to cover it.  If they are smaller flushes but to
different regions you most likely are suffering from global limit pressure
and flushing too soon.

Are you flushing prematurely due to HLogs rolling?  Take a look for too
many hlogs and look at the flushes.  It may benefit you to raise that value.

Are you blocking?  As Suraj was saying you may be blocking in 90second
blocks.  Check the RS logs for those messages as well and then Suraj's
advice.

This is where I would start to optimize your write path.  I hope the above
helps.

On Fri, Oct 12, 2012 at 3:34 AM, Suraj Varma <svarma.ng@gmail.com> wrote:

> What have you configured your hbase.hstore.blockingStoreFiles and
> hbase.hregion.memstore.block.multiplier? Both of these block updates
> when the limit is hit. Try increasing these to say 20 and 4 from the
> default 7 and 2 and see if it helps.
>
> If this still doesn't help, see if you can set up ganglia to get a
> better insight into what is bottlenecking.
> --Suraj
>
>
>
> On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
> <pankaj.misra@impetus.co.in> wrote:
> > OK, Looks like I missed out reading that part in your original mail. Did
> you try some of the compaction tweaks and configurations as explained in
> the following link for your data?
> > http://hbase.apache.org/book/regions.arch.html#compaction
> >
> >
> > Also, how much data are your putting into the regions, and how big is
> one region at the end of data ingestion?
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> > -----Original Message-----
> > From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com]
> > Sent: Friday, October 12, 2012 12:04 PM
> > To: user@hbase.apache.org
> > Subject: RE: more regionservers does not improve performance
> >
> > Pankaj,
> >
> > Thanks  for the reply.
> >
> > Actually, I am using MD5 hashing to evenly spread the keys among the
> splits, so I don’t believe there is any hotspot. In fact, when I monitory
> the web UI for HBase I see a very even load on all the regionservers.
> >
> > Jon
> >
> > Sent from my Windows 8 PC <http://windows.microsoft.com/consumer-preview
> >
> >
> >  *From:* Pankaj Misra <pankaj.misra@impetus.co.in>
> > *Sent:* Thursday, October 11, 2012 8:24:32 PM
> > *To:* user@hbase.apache.org
> > *Subject:* RE: more regionservers does not improve performance
> >
> > Hi Jonathan,
> >
> > What seems to me is that, while doing the split across all 40 mappers,
> the keys are not randomized enough to leverage multiple regions and the
> pre-split strategy. This may be happening because all the 40 mappers may be
> trying to write onto a single region for sometime, making it a HOT region,
>  till the key falls into another region, and then the other region becomes
> a HOT region hence you may seeing a high impact of compaction cycles
> reducing your throughput.
> >
> > Are the keys incremental? Are the keys randomized enough across the
> splits?
> >
> > Ideally when all 40 mappers are running you should see all the regions
> being filled up in parallel for maximum throughput. Hope it helps.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > ________________________________________
> > From: Jonathan Bishop [jbishop.rwc@gmail.com]
> > Sent: Friday, October 12, 2012 5:38 AM
> > To: user@hbase.apache.org
> > Subject: more regionservers does not improve performance
> >
> > Hi,
> >
> > I am running a MR job with 40 simultaneous mappers, each of which does
> puts to HBase. I have ganged up the puts into groups of 1000 (this seems to
> help quite a bit) and also made sure that the table is pre-split into 100
> regions, and that the row keys are randomized using MD5 hashing.
> >
> > My cluster size is 10, and I am allowing 4 mappers per tasktracker.
> >
> > In my MR job I know that the mappers are able to generate puts much
> faster than the puts can be handled in hbase. In other words if I let the
> mappers run without doing hbase puts then everything scales as you would
> expect with the number of mappers created. It is the hbase puts which seem
> to be the bottleneck.
> >
> > What is strange is that I do not get much run time improvement by
> increasing the number regionservers beyond about 4. Indeed, it seems that
> the system runs slower with 8 regionservers than with 4.
> >
> > I have added the following in hbase-env.sh hoping this would help...
> (from the book HBase in Action)
> >
> > export HBASE_OPTS="-Xmx8g"
> > export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70"
> >
> > # Uncomment below to enable java garbage collection logging in the .out
> file.
> > export HBASE_OPTS="${HBASE_OPTS} -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:${HBASE_HOME}/logs/gc-hbase.log"
> >
> > Monitoring hbase through the web ui I see that there are pauses for
> flushing, which seems to run pretty quickly, and for compacting, which
> seems to take somewhat longer.
> >
> > Any advice for making this run faster would be greatly appreciated.
> > Currently I am looking into installing Ganglia to better monitory my
> cluster, but yet to have that running.
> >
> > I suspect an I/O issue as the regionservers do not seem terribly loaded.
> >
> > Thanks,
> >
> > Jon
> >
> > ________________________________
> >
> > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
> >
> > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor
> Interoperable Systems’ available at http://lf1.me/0E/.
> >
> >
> > NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
> >
> > ________________________________
> >
> > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
> >
> > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor
> Interoperable Systems’ available at http://lf1.me/0E/.
> >
> >
> > NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message