hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "FAQ" by KonstantinShvachko
Date Tue, 28 Aug 2007 21:32:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:

   2. A simpler way, with no interruption of service, is to turn up the replication of files,
wait for transfers to stabilize, and then turn the replication back down.
   3. Yet another way to re-balance blocks is to turn off the data-node, which is full, wait
until its blocks are replicated, and then bring it back again. The over-replicated blocks
will be randomly removed from different nodes, so you really get them rebalanced not just
removed from the current node.
+ == 7. What is the purpose of the secondary name-node? ==
+ The term "secondary name-node" is somewhat misleading.
+ It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node,
+ and in no event it can replace the primary name-node in case of its failure.
+ The only purpose of the secondary name-node is to perform periodic checkpoints.
+ The secondary name-node periodically downloads current name-node image and edits log files,
+ joins them into new image and uploads the new image back to the (primary and the only) name-node.
+ So if the name-node fails and you can restart it on the same physical node then there is
no need 
+ to shutdown data-nodes, just the name-node need to be restarted.
+ If you cannot use the old node anymore you will need to copy the latest image somewhere
+ The latest image can be found either on the node that used to be the primary before failure
if available;
+ or on the secondary name-node. The latter will be the latest checkpoint without subsequent
edits logs, 
+ that is the most recent name space modifications may be missing there.
+ You will also need to restart the whole cluster in this case.

View raw message