hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Misra <pankaj.mi...@impetus.co.in>
Subject RE: more regionservers does not improve performance
Date Fri, 12 Oct 2012 06:47:58 GMT
OK, Looks like I missed out reading that part in your original mail. Did you try some of the
compaction tweaks and configurations as explained in the following link for your data?
http://hbase.apache.org/book/regions.arch.html#compaction


Also, how much data are your putting into the regions, and how big is one region at the end
of data ingestion?

Thanks and Regards
Pankaj Misra

-----Original Message-----
From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com]
Sent: Friday, October 12, 2012 12:04 PM
To: user@hbase.apache.org
Subject: RE: more regionservers does not improve performance

Pankaj,

Thanks  for the reply.

Actually, I am using MD5 hashing to evenly spread the keys among the splits, so I don’t
believe there is any hotspot. In fact, when I monitory the web UI for HBase I see a very even
load on all the regionservers.

Jon

Sent from my Windows 8 PC <http://windows.microsoft.com/consumer-preview>

 *From:* Pankaj Misra <pankaj.misra@impetus.co.in>
*Sent:* Thursday, October 11, 2012 8:24:32 PM
*To:* user@hbase.apache.org
*Subject:* RE: more regionservers does not improve performance

Hi Jonathan,

What seems to me is that, while doing the split across all 40 mappers, the keys are not randomized
enough to leverage multiple regions and the pre-split strategy. This may be happening because
all the 40 mappers may be trying to write onto a single region for sometime, making it a HOT
region,  till the key falls into another region, and then the other region becomes a HOT region
hence you may seeing a high impact of compaction cycles reducing your throughput.

Are the keys incremental? Are the keys randomized enough across the splits?

Ideally when all 40 mappers are running you should see all the regions being filled up in
parallel for maximum throughput. Hope it helps.

Thanks and Regards
Pankaj Misra


________________________________________
From: Jonathan Bishop [jbishop.rwc@gmail.com]
Sent: Friday, October 12, 2012 5:38 AM
To: user@hbase.apache.org
Subject: more regionservers does not improve performance

Hi,

I am running a MR job with 40 simultaneous mappers, each of which does puts to HBase. I have
ganged up the puts into groups of 1000 (this seems to help quite a bit) and also made sure
that the table is pre-split into 100 regions, and that the row keys are randomized using MD5
hashing.

My cluster size is 10, and I am allowing 4 mappers per tasktracker.

In my MR job I know that the mappers are able to generate puts much faster than the puts can
be handled in hbase. In other words if I let the mappers run without doing hbase puts then
everything scales as you would expect with the number of mappers created. It is the hbase
puts which seem to be the bottleneck.

What is strange is that I do not get much run time improvement by increasing the number regionservers
beyond about 4. Indeed, it seems that the system runs slower with 8 regionservers than with
4.

I have added the following in hbase-env.sh hoping this would help... (from the book HBase
in Action)

export HBASE_OPTS="-Xmx8g"
export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70"

# Uncomment below to enable java garbage collection logging in the .out file.
export HBASE_OPTS="${HBASE_OPTS} -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:${HBASE_HOME}/logs/gc-hbase.log"

Monitoring hbase through the web ui I see that there are pauses for flushing, which seems
to run pretty quickly, and for compacting, which seems to take somewhat longer.

Any advice for making this run faster would be greatly appreciated.
Currently I am looking into installing Ganglia to better monitory my cluster, but yet to have
that running.

I suspect an I/O issue as the regionservers do not seem terribly loaded.

Thanks,

Jon

________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’
available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’
available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.
Mime
View raw message