hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "FAQ" by KonstantinShvachko
Date Wed, 07 Nov 2007 04:07:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:

  components, then the original writer will get an IOException either when it finishes writing
to the current 
  block or when it closes the file.
+ [[BR]]
+ [[Anchor(17)]]
+ '''17. [#17 HDFS. I want to make a large cluster smaller by taking out a bunch of nodes
simultaneously. How can this be done?]'''
+ On a large cluster removing one or two data-nodes will not lead to any data loss, because

+ name-node will replicate their blocks as long as it will detect that the nodes are dead.
+ With a large number of nodes probability of loosing data is high.
+ Hadoop offers the ''decommission'' feature to retire a set of existing data-nodes.
+ The nodes to be retired should be included into the ''exclude file'', and the exclude file
name should 
+ be specified as a configuration parameter
+ [http://lucene.apache.org/hadoop/hadoop-default.html#dfs.hosts.exclude dfs.hosts.exclude].
+ Then the shell command
+ {{{
+ bin/hadoop dfsadmin -refreshNodes
+ }}}
+ should be called, which forces the name-node to re-read its configuration files, including
the exclude file
+ and start the decommission process. The nodes can be removed whenever decommission is finished.
+ Decommission does not happen momentarily since we do not want the cluster to be overwhelmed
with just this one job.
+ The decommission process can be terminated at any time by editing the configuration or the
exclude files 
+ and repeating the {{{-refreshNodes}}} command.

View raw message