hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bing Jiang <jiangbinglo...@gmail.com>
Subject Re: Hadoop / HBase hotspotting / overloading specific nodes
Date Thu, 09 Oct 2014 06:51:57 GMT
Could you set a reserved room for non-dfs usage? Just to avoid the disk
gets full.  <hdfs-site.xml>




<description>Reserved space in bytes per volume. Always leave this much
space free for non dfs use.



2014-10-09 14:01 GMT+08:00 SF Hadoop <sfhadoop@gmail.com>:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
> I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
> datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
> I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
> Am I missing something here?
> Thanks for any help.

Bing Jiang

View raw message