hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Hadoop / HBase hotspotting / overloading specific nodes
Date Thu, 09 Oct 2014 07:12:42 GMT
Looks like the number of regions is lower than the number of nodes in the cluster. 

Can you split the table such that, after hbase balancer is run, there is region hosted by
every node ?


On Oct 8, 2014, at 11:01 PM, SF Hadoop <sfhadoop@gmail.com> wrote:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is "off-topic" please
> I am having a problem with Hadoop maxing out drive space on a select few nodes when I
am running an HBase job.  The scenario is this:
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode /
regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes hosting the regions
start to get full and eventually drive space hits 100% utilization whilst the other nodes
in the cluster are at 40% or less drive space utilization
> - The job in Hadoop then begins to hang with multiple "out of space" errors and eventually
> I have tried running hadoop balancer during the job run and this helped but only really
succeeded in prolonging the eventual job failure.
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring
the nodes that the regions are on?
> Am I missing something here?
> Thanks for any help.

View raw message