hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "FAQ" by MichaelSchmitz
Date Wed, 31 Aug 2011 17:45:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by MichaelSchmitz:
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=107&rev2=108

Comment:
Added some detail to the part about decomissioning a node.

  == I want to make a large cluster smaller by taking out a bunch of nodes simultaneously.
How can this be done? ==
  On a large cluster removing one or two data-nodes will not lead to any data loss, because
 name-node will replicate their blocks as long as it will detect that the nodes are dead.
With a large number of nodes getting removed or dying the probability of losing data is higher.
  
- Hadoop offers the ''decommission'' feature to retire a set of existing data-nodes. The nodes
to be retired should be included into the ''exclude file'', and the exclude file name should
 be specified as a configuration parameter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.hosts.exclude|dfs.hosts.exclude]].
This file should have been specified during namenode startup. It could be a zero length file.
You must use the full hostname, ip or ip:port format in this file.  Then the shell command
+ Hadoop offers the ''decommission'' feature to retire a set of existing data-nodes. The nodes
to be retired should be included into the ''exclude file'', and the exclude file name should
 be specified as a configuration parameter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.hosts.exclude|dfs.hosts.exclude]].
This file should have been specified during namenode startup. It could be a zero length file.
You must use the full hostname, ip or ip:port format in this file.  (Note that some users
have trouble using the host name.  If your namenode shows some nodes in "Live" and "Dead"
but not decommission, try using the full ip:port.)  Then the shell command
  
  {{{
  bin/hadoop dfsadmin -refreshNodes
  }}}
  should be called, which forces the name-node to re-read the exclude file and start the decommission
process.
  
- Decommission does not happen momentarily since it requires replication of potentially a
large number of blocks  and we do not want the cluster to be overwhelmed with just this one
job. The decommission progress can be monitored on the name-node Web UI.  Until all blocks
are replicated the node will be in "Decommission In Progress" state. When decommission is
done the state will change to "Decommissioned".  The nodes can be removed whenever decommission
is finished.
+ Decommission is not instant since it requires replication of potentially a large number
of blocks  and we do not want the cluster to be overwhelmed with just this one job. The decommission
progress can be monitored on the name-node Web UI.  Until all blocks are replicated the node
will be in "Decommission In Progress" state. When decommission is done the state will change
to "Decommissioned".  The nodes can be removed whenever decommission is finished.
  
  The decommission process can be terminated at any time by editing the configuration or the
exclude files  and repeating the {{{-refreshNodes}}} command.
  

Mime
View raw message