hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andrew touchet <adt...@latech.edu>
Subject Decommissioning a data node and problems bringing it back online
Date Wed, 23 Jul 2014 20:18:13 GMT

I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
Currently, users can run jobs that use data stored on /hdfs. They are able
to access all datanodes/compute nodes except the one being decommissioned.

Is this safe to do? Will edited files affect the decommissioning node?

I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
simply wait for log files to report completion. After upgrade, I simply
remove the node from hosts_exlude and start hadoop again on the datanode.

Also: Under the namenode web interface I just noticed that the node I have
decommissioned previously now has 0 Configured capacity, Used, Remaining
memory and is now 100% Used.

I used the same /etc/sysconfig/hadoop file from before the upgrade, removed
the node from hosts_exclude, and ran '-refreshNodes' afterwards.

What steps have I missed in the decommissioning process or while bringing
the data node back online?

View raw message