hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: NameNode failover procedure
Date Mon, 27 Aug 2007 22:01:12 GMT
The problem here is probably that the name "secondary namenode" is 
It is not a name-node in the sense that data-nodes cannot connect to the 
secondary name-node,
and in no event it can replace the primary name-node in case of its failure.

The only purpose of the secondary name-node is to perform periodic 
The secondary name-node periodically downloads current name-node image 
and edits log files,
joins them into new image and uploads the new image back to the (primary 
and the only) name-node.

So if the name-node fails and you can restart it on the same node then 
there is no need to shut down data-nodes,
just the name-node need to be restarted.
If you cannot use the old node anymore you will need to copy the latest 
image somewhere else.
The latest image can be found either on the node that used to be the 
primary before failure if available;
or from the secondary name-node. This will be latest checkpoint without 
subsequent edits log, so the
latest name space modifications may be missing there.
You will probably need to restart the whole cluster in this case. I 
don't know whether dns tricks will
work with current rpc implementation.


Ankur Sethi wrote:

>It seems there is no answer yet for all these questions and the wiki has not
>been updated.
>I do not understand the statement of just changing the DNS settings.  How
>will that work exactly?
>We would have to change the masters list so that the secondary namenode is
>first on the list and it would work automatically?  The files in the
>secondary namenode directory are quite different, how do they get used by a
>primary name node?
>It is still quite confusing to me.
>-----Original Message-----
>From: Ted Dunning [mailto:tdunning@veoh.com] 
>Sent: Friday, 20 July, 2007 1:07 PM
>To: hadoop-user@lucene.apache.org
>Subject: Re: NameNode failover procedure
>This is now on the wiki under NameNodeFailover and linked from the main
>There are some questions unanswered on that page, however.  Could somebody
>who actually knows the answers (unlike me) edit that page to fill it out a
>On 7/20/07 9:53 AM, "Doug Cutting" <cutting@apache.org> wrote:
>>>So far I learned that the secondary namenode keeps refreshing
>>>periodically its backup copies of fsimage and editlog files, and if the
>>>primary namenode disappears, it's the responsibility of the cluster
>>>admin to notice this, shut down the cluster, switch the configs across
>>>the cluster to point to the secondary namenode, start a primary namenode
>>>on the secondary namenode's host, and restart the rest of the daemons.
>>If you use DNS to switch the namenode from the primary to the secondary,
>>then no configuration changes or other daemon restarts are required.  I
>>think that is the best practice.

View raw message