hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: Decommissioning a data node and problems bringing it back online
Date Thu, 24 Jul 2014 17:01:32 GMT
You should not face any data loss. The replicas were just moved away from that node to other
nodes in the cluster during decommission. Once you recommission the node and re-balance your
cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned
node will receive replicas from other nodes, but there is no guarantee that exact the same
replicas that were stored on this node before it was decommissioned will be assigned to this
node again, after recommission and rebalance.

Cheers,
Wellington. 


On 24 Jul 2014, at 17:55, andrew touchet <adt027@latech.edu> wrote:

> Hi Mirko,
> 
> Thanks for the reply!
> 
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect any data loss
due to starting the data node process before running the balancing tool?
> 
> Best Regards,
> 
> Andrew Touchet
> 
> 
> 
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mirko.kaempf@gmail.com> wrote:
> After you added the nodes back to your cluster you run the balancer tool, but it will
not bring in exactly the same blocks like before.
> 
> Cheers,
> Mirko
> 
> 
> 
> 2014-07-24 17:34 GMT+01:00 andrew touchet <adt027@latech.edu>:
> 
> Thanks for the reply,
> 
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference.

> 
> Currently I really need to know how to get the data that was replicated during decommissioning
back onto my two data nodes. 
> 
> 
> 
> 
> 
> On Thursday, July 24, 2014, Stanley Shi <sshi@gopivotal.com> wrote:
> which distribution are you using? 
> 
> Regards,
> Stanley Shi,
> 
> 
> 
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <adt027@latech.edu> wrote:
> I should have added this in my first email but I do get an error in the data node's log
file
> 
> '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0 blocks got processed in 1 msecs'
> 
> 
> 
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <adt027@latech.edu> wrote:
> Hello,
> 
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users
can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute
nodes except the one being decommissioned. 
> 
> Is this safe to do? Will edited files affect the decommissioning node?
> 
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running   'hadoop
dfsadmin -refreshNodes' on the name name node.  Then I simply wait for log files to report
completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again
on the datanode.
> 
> Also: Under the namenode web interface I just noticed that the node I have decommissioned
previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. 
> 
> I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node
from hosts_exclude, and ran '-refreshNodes' afterwards.  
> 
> What steps have I missed in the decommissioning process or while bringing the data node
back online?
> 
> 
> 
> 
> 
> 
> 


Mime
View raw message