hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SF Hadoop <sfhad...@gmail.com>
Subject Hadoop / HBase hotspotting / overloading specific nodes
Date Thu, 09 Oct 2014 06:01:00 GMT
I'm not sure if this is an HBase issue or an Hadoop issue so if this is
"off-topic" please forgive.

I am having a problem with Hadoop maxing out drive space on a select few
nodes when I am running an HBase job.  The scenario is this:

- The job is a data import using Map/Reduce / HBase
- The data is being imported to one table
- The table only has a couple of regions
- As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
datanode / regionserver that is hosting  the regions
- As the job progresses (and more data is imported) the two datanodes
hosting the regions start to get full and eventually drive space hits 100%
utilization whilst the other nodes in the cluster are at 40% or less drive
space utilization
- The job in Hadoop then begins to hang with multiple "out of space" errors
and eventually fails.

I have tried running hadoop balancer during the job run and this helped but
only really succeeded in prolonging the eventual job failure.

How can I get Hadoop / HBase to distribute the data to HDFS more evenly
when it is favoring the nodes that the regions are on?

Am I missing something here?

Thanks for any help.

View raw message