hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "TestFaqPage" by SomeOtherAccount
Date Wed, 06 Oct 2010 22:09:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "TestFaqPage" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/TestFaqPage?action=diff&rev1=3&rev2=4

--------------------------------------------------

  
  = HDFS =
  
- <<BR>> <<Anchor(3.1)>> '''1. [[#A3.1|If I add new data-nodes to
the cluster will HDFS move the blocks to the newly added nodes in order to balance disk space
utilization between the nodes?]]'''
+ == If I add new DataNodes to the cluster will HDFS move the blocks to the newly added nodes
in order to balance disk space utilization between the nodes? ==
  
  No, HDFS will not move blocks to new nodes automatically. However, newly created files will
likely have their blocks placed on the new nodes.
  
@@ -193, +193 @@

    * [[http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing|HDFS Tutorial:
Rebalancing]];
    * [[http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer|HDFS Commands
Guide: balancer]].
  
- <<BR>> <<Anchor(3.2)>> '''2. [[#A3.2|What is the purpose of the
secondary name-node?]]'''
+ == What is the purpose of the secondary name-node? ==
  
  The term "secondary name-node" is somewhat misleading. It is not a name-node in the sense
that data-nodes cannot connect to the secondary name-node, and in no event it can replace
the primary name-node in case of its failure.
  
@@ -201, +201 @@

  
  So if the name-node fails and you can restart it on the same physical node then there is
no need  to shutdown data-nodes, just the name-node need to be restarted. If you cannot use
the old node anymore you will need to copy the latest image somewhere else. The latest image
can be found either on the node that used to be the primary before failure if available; or
on the secondary name-node. The latter will be the latest checkpoint without subsequent edits
logs,  that is the most recent name space modifications may be missing there. You will also
need to restart the whole cluster in this case.
  
- <<BR>> <<Anchor(3.3)>> '''3. [[#A3.3|Does the name-node stay in
safe mode till all under-replicated files are fully replicated?]]'''
+ == Does the name-node stay in safe mode till all under-replicated files are fully replicated?
==
  
  No. During safe mode replication of blocks is prohibited.  The name-node awaits when all
or majority of data-nodes report their blocks.
  
@@ -211, +211 @@

  
  Learn more about safe mode [[http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Safemode|in
the HDFS Users' Guide]].
  
- <<BR>> <<Anchor(3.4)>> '''4. [[#A3.4|How do I set up a hadoop node
to use multiple volumes?]]'''
+ == How do I set up a hadoop node to use multiple volumes? ==
  
  ''Data-nodes'' can store blocks in multiple directories typically allocated on different
local disk drives. In order to setup multiple directories one needs to specify a comma separated
list of pathnames as a value of the configuration parameter  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.data.dir|dfs.data.dir]].
Data-nodes will attempt to place equal amount of data in each of the directories.
  
  The ''name-node'' also supports multiple directories, which in the case store the name space
image and the edits log. The directories are specified via the  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.name.dir|dfs.name.dir]]
configuration parameter. The name-node directories are used for the name space data replication
so that the image and the  log could be restored from the remaining volumes if one of them
fails.
  
- <<BR>> <<Anchor(3.5)>> '''5. [[#A3.5|What happens if one Hadoop
client renames a file or a directory containing this file while another client is still writing
into it?]]'''
+ == What happens if one Hadoop client renames a file or a directory containing this file
while another client is still writing into it? ==
  
  Starting with release hadoop-0.15, a file will appear in the name space as soon as it is
created.  If a writer is writing to a file and another client renames either the file itself
or any of its path  components, then the original writer will get an IOException either when
it finishes writing to the current  block or when it closes the file.
  
- <<BR>> <<Anchor(3.6)>> '''6. [[#A3.6|I want to make a large cluster
smaller by taking out a bunch of nodes simultaneously. How can this be done?]]'''
+ == I want to make a large cluster smaller by taking out a bunch of nodes simultaneously.
How can this be done? ==
  
  On a large cluster removing one or two data-nodes will not lead to any data loss, because
 name-node will replicate their blocks as long as it will detect that the nodes are dead.
With a large number of nodes getting removed or dying the probability of losing data is higher.
  
@@ -236, +236 @@

  
  The decommission process can be terminated at any time by editing the configuration or the
exclude files  and repeating the {{{-refreshNodes}}} command.
  
- <<BR>> <<Anchor(3.7)>> '''7. [[#A3.7|Wildcard characters doesn't
work correctly in FsShell.]]'''
+ == Wildcard characters doesn't work correctly in FsShell. ==
  
  When you issue a command in !FsShell, you may want to apply that command to more than one
file. !FsShell provides a wildcard character to help you do so.  The * (asterisk) character
can be used to take the place of any set of characters. For example, if you would like to
list all the files in your account which begin with the letter '''x''', you could use the
ls command with the * wildcard:
  

Mime
View raw message