hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Speed up node under replicated block during decomission
Date Fri, 12 Aug 2011 17:51:11 GMT

Just a thought...

Really quick and dirty thing to do is to turn off the node. 
Within 10 minutes the node looks down to the JT and NN so it gets marked as down.
Run an fsck and it will show the files as under replicated and then will do the replication
at the faster speed to rebalance the cluster.
(100MB/sec should be ok on a 1GBe link)

Then you can drop the next node... much faster than trying to decomission the node.

Its not the best way to do it, but it works.


> From: harsh@cloudera.com
> Date: Fri, 12 Aug 2011 22:38:08 +0530
> Subject: Re: Speed up node under replicated block during decomission
> To: common-user@hadoop.apache.org
> 
> It could be that your process has hung cause a particular resident
> block (file) requires a very large replication factor, and your
> remaining # of nodes is less than that value. This is a genuine reason
> for hang (but must be fixed). The process usually waits until there
> are no under-replicated blocks, so I'd use fsck to check if any such
> ones are present and setrep them to a lower value.
> 
> On Fri, Aug 12, 2011 at 9:28 PM,  <jonathan.hwang@accenture.com> wrote:
> > Hi All,
> >
> > I'm trying to decommission data node from my cluster.  I put the data node in the
/usr/lib/hadoop/conf/dfs.hosts.exclude list and restarted the name nodes.  The under-replicated
blocks are starting to replicate, but it's going down in a very slow pace.  For 1 TB of data
it takes over 1 day to complete.   We change the settings as below and try to increase the
replication rate.
> >
> > Added this to hdfs-site.xml on all the nodes on the cluster and restarted the data
nodes and name node processes.
> > <property>
> >  <!-- 100Mbit/s -->
> >  <name>dfs.balance.bandwidthPerSec</name>
> >  <value>131072000</value>
> > </property>
> >
> > Speed didn't seem to pick up. Do you know what may be happening?
> >
> > Thanks!
> > Jonathan
> >
> > This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise private information.  If you have received it in error, please notify the sender
immediately and delete the original.  Any other use of the email by you is prohibited.
> >
> 
> 
> 
> -- 
> Harsh J
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message