hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: SecondaryNameNode on separate machine
Date Wed, 29 Oct 2008 06:12:48 GMT
Hi,
So what is the "recipe" for avoiding NN SPOF using only what comes with Hadoop?

>From what I can tell, I think one has to do the following two things:

1) configure primary NN to save namespace and xa logs to multiple dirs, one of which is actually
on a remotely mounted disk, so that the data actually lives on a separate disk on a separate
box.  This saves namespace and xa logs on multiple boxes in case of primary NN hardware failure.

2) configure secondary NN to periodically merge fsimage+edits and create the fsimage checkpoint.
 This really is a second NN process running on another box.  It sounds like this secondary
NN has to somehow have access to fsimage & edits files from the primary NN server.  http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode
does not describe the best practise around that - the recommended way to give secondary NN
access to primary NN's fsimage and edits files.  Should one mount a disk from the primary
NN box to the secondary NN box to get access to those files?  Or is there a simpler way?
In any case, this checkpoint is just a merge of fsimage+edits files and again is there in
case the box with the primary NN dies.  That's what's described on http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode
more or less.

Is this sufficient, or are there other things one has to do to eliminate NN SPOF?


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, October 28, 2008 8:14:44 PM
> Subject: Re: SecondaryNameNode on separate machine
> 
> Tomislav.
> 
> Contrary to popular belief the secondary namenode does not provide failover,
> it's only used to do what is described here :
> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode
> 
> So the term "secondary" does not mean "a second one" but is more like "a
> second part of".
> 
> J-D
> 
> On Tue, Oct 28, 2008 at 9:44 AM, Tomislav Poljak wrote:
> 
> > Hi,
> > I'm trying to implement NameNode failover (or at least NameNode local
> > data backup), but it is hard since there is no official documentation.
> > Pages on this subject are created, but still empty:
> >
> > http://wiki.apache.org/hadoop/NameNodeFailover
> > http://wiki.apache.org/hadoop/SecondaryNameNode
> >
> > I have been browsing the web and hadoop mailing list to see how this
> > should be implemented, but I got even more confused. People are asking
> > do we even need SecondaryNameNode etc. (since NameNode can write local
> > data to multiple locations, so one of those locations can be a mounted
> > disk from other machine). I think I understand the motivation for
> > SecondaryNameNode (to create a snapshoot of NameNode data every n
> > seconds/hours), but setting (deploying and running) SecondaryNameNode on
> > different machine than NameNode is not as trivial as I expected. First I
> > found that if I need to run SecondaryNameNode on other machine than
> > NameNode I should change masters file on NameNode (change localhost to
> > SecondaryNameNode host) and set some properties in hadoop-site.xml on
> > SecondaryNameNode (fs.default.name, fs.checkpoint.dir,
> > fs.checkpoint.period etc.)
> >
> > This was enough to start SecondaryNameNode when starting NameNode with
> > bin/start-dfs.sh , but it didn't create image on SecondaryNameNode. Then
> > I found that I need to set dfs.http.address on NameNode address (so now
> > I have NameNode address in both fs.default.name and dfs.http.address).
> >
> > Now I get following exception:
> >
> > 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - Exception in
> > doCheckpoint:
> > 2008-10-28 09:18:00,098 ERROR NameNode.Secondary -
> > java.net.SocketException: Unexpected end of file from server
> >
> > My questions are following:
> > How to resolve this problem (this exception)?
> > Do I need additional property in SecondaryNameNode's hadoop-site.xml or
> > NameNode's hadoop-site.xml?
> >
> > How should NameNode failover work ideally? Is it like this:
> >
> > SecondaryNameNode runs on separate machine than NameNode and stores
> > NameNode's data (fsimage and fsiedits) locally in fs.checkpoint.dir.
> > When NameNode machine crashes, we start NameNode on machine where
> > SecondaryNameNode was running and we set dfs.name.dir to
> > fs.checkpoint.dir. Also we need to change how DNS resolves NameNode
> > hostname (change from the primary to the secondary).
> >
> > Is this correct ?
> >
> > Tomislav
> >
> >
> >


Mime
View raw message