Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of tpoljak@gmail.com designates
 72.14.220.152 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=subject:from:to:in-reply-to:references:content-type:date:message-id
         :mime-version:x-mailer:content-transfer-encoding;
        b=xmY48bixKYP2GsIIjk6IxdZaYmitw75UMSRtWzmwJAdkB9XG8wXdcoO5wicObW8IkJ
         Jn5JrKEyaf/XnQRmoTYkUGd+nDYgGkHTJEfsOdj2o3Uv5apQ0RVvvy5vJ3OSZjpOZXZY
         zOOwHF6gdsmqtciWjHxCnXAquAqzgvD5/tggw=
Subject: Re: SecondaryNameNode on separate machine
From: Tomislav Poljak <tpoljak@gmail.com>
To: core-user@hadoop.apache.org
In-Reply-To: <490F4D64.6020103@yahoo-inc.com>
References: <1225201444.6502.67.camel@tpoljak-laptop>
	 <31a243e70810281714s40fab599n7a8a15882eb297de@mail.gmail.com>
	 <996579.53071.qm@web50310.mail.re2.yahoo.com>
	 <31a243e70810290608j21aa6e03v8bf1c4dd49bdeea2@mail.gmail.com>
	 <1225317786.6500.101.camel@tpoljak-laptop> <4908ED74.2070807@yahoo-inc.com>
		 <1225389138.13755.6.camel@tpoljak-laptop>
	 <972612.310.qm@web50311.mail.re2.yahoo.com>
	 <490B5096.7050906@yahoo-inc.com> <1225717848.6545.24.camel@tpoljak-laptop>
	 <490F4D64.6020103@yahoo-inc.com>
Content-Type: text/plain
Date: Tue, 04 Nov 2008 14:05:19 +0100
Message-Id: <1225803919.6565.3.camel@tpoljak-laptop>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

Konstantin,

it works, thanks a lot!

Tomislav


On Mon, 2008-11-03 at 11:13 -0800, Konstantin Shvachko wrote:
> You can either do what you just described with dfs.name.dir = dirX
> or you can start name-node with -importCheckpoint option.
> This is an automation for copying image files from secondary to primary.
> 
> See here:
> http://hadoop.apache.org/core/docs/current/commands_manual.html#namenode
> http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode
> http://issues.apache.org/jira/browse/HADOOP-2585#action_12584755
> 
> --Konstantin
> 
> Tomislav Poljak wrote:
> > Hi,
> > Thank you all for your time and your answers!
> > 
> > Now SecondaryNameNode connects to the NameNode (after I configured
> > dfs.http.address to the NN's http server -> NN hostname on port 50070)
> > and creates(transfers) edits and fsimage from NameNode.
> > 
> > Can you explain me a little bit more how NameNode failover should work
> > now? 
> > 
> > For example, SecondaryNameNode now stores fsimage and edits to (SNN's)
> > dirX and let's say NameNode goes down (disk becomes unreadable). Now I
> > create/dedicate a new machine for NameNode (also change DNS to point to
> > this new NameNode machine as nameNode host) and take the data dirX from
> > SNN and copy it to new NameNode. How do I configure new NameNode to use
> > data from dirX (do I configure dfs.name.dir to point to dirX and start
> > new NameNode)?
> > 
> > Thanks,
> >         Tomislav
> > 
> > 
> > 
> > On Fri, 2008-10-31 at 11:38 -0700, Konstantin Shvachko wrote:
> >> True, dfs.http.address is the NN Web UI address.
> >> This where the NN http server runs. Besides the Web UI there also
> >> a servlet running on that server which is used to transfer image
> >> and edits from NN to the secondary using http get.
> >> So SNN uses both addresses fs.default.name and dfs.http.address.
> >>
> >> When SNN finishes the checkpoint the primary needs to transfer the
> >> resulting image back. This is done via the http server running on SNN.
> >>
> >> Answering Tomislav's question:
> >> The difference between fs.default.name and dfs.http.address is that
> >> fs.default.name is the name-node's PRC address, where clients and
> >> data-nodes connect to, while dfs.http.address is the NN's http server
> >> address where our browsers connect to, but it is also used for
> >> transferring image and edits files.
> >>
> >> --Konstantin
> >>
> >> Otis Gospodnetic wrote:
> >>> Konstantin & Co, please correct me if I'm wrong, but looking at hadoop-default.xml makes me think that dfs.http.address is only the URL for the NN *Web UI*.  In other words, this is where we people go look at the NN.
> >>>
> >>> The secondary NN must then be using only the Primary NN URL specified in fs.default.name.  This URL looks like hdfs://name-node-hostname-here/.  Something in Hadoop then knows the exact port for the Primary NN based on the URI schema (e.g. "hdfs://") in this URL.
> >>>
> >>> Is this correct?
> >>>
> >>>
> >>> Thanks,
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> ----- Original Message ----
> >>>> From: Tomislav Poljak <tpoljak@gmail.com>
> >>>> To: core-user@hadoop.apache.org
> >>>> Sent: Thursday, October 30, 2008 1:52:18 PM
> >>>> Subject: Re: SecondaryNameNode on separate machine
> >>>>
> >>>> Hi,
> >>>> can you, please, explain the difference between fs.default.name and
> >>>> dfs.http.address (like how and when is SecondaryNameNode using
> >>>> fs.default.name and how/when dfs.http.address). I have set them both to
> >>>> same (namenode's) hostname:port. Is this correct (or dfs.http.address
> >>>> needs some other port)? 
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Tomislav
> >>>>
> >>>> On Wed, 2008-10-29 at 16:10 -0700, Konstantin Shvachko wrote:
> >>>>> SecondaryNameNode uses http protocol to transfer the image and the edits
> >>>>> from the primary name-node and vise versa.
> >>>>> So the secondary does not access local files on the primary directly.
> >>>>> The primary NN should know the secondary's http address.
> >>>>> And the secondary NN need to know both fs.default.name and dfs.http.address of 
> >>>> the primary.
> >>>>> In general we usually create one configuration file hadoop-site.xml
> >>>>> and copy it to all other machines. So you don't need to set up different
> >>>>> values for all servers.
> >>>>>
> >>>>> Regards,
> >>>>> --Konstantin
> >>>>>
> >>>>> Tomislav Poljak wrote:
> >>>>>> Hi,
> >>>>>> I'm not clear on how does SecondaryNameNode communicates with NameNode
> >>>>>> (if deployed on separate machine). Does SecondaryNameNode uses direct
> >>>>>> connection (over some port and protocol) or is it enough for
> >>>>>> SecondaryNameNode to have access to data which NameNode writes locally
> >>>>>> on disk?
> >>>>>>
> >>>>>> Tomislav
> >>>>>>
> >>>>>> On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote:
> >>>>>>> I think a lot of the confusion comes from this thread :
> >>>>>>> http://www.nabble.com/NameNode-failover-procedure-td11711842.html
> >>>>>>>
> >>>>>>> Particularly because the wiki was updated with wrong information, not
> >>>>>>> maliciously I'm sure. This information is now gone for good.
> >>>>>>>
> >>>>>>> Otis, your solution is pretty much like the one given by Dhruba Borthakur
> >>>>>>> and augmented by Konstantin Shvachko later in the thread but I never did it
> >>>>>>> myself.
> >>>>>>>
> >>>>>>> One thing should be clear though, the NN is and will remain a SPOF (just
> >>>>>>> like HBase's Master) as long as a distributed manager service (like
> >>>>>>> Zookeeper) is not plugged into Hadoop to help with failover.
> >>>>>>>
> >>>>>>> J-D
> >>>>>>>
> >>>>>>> On Wed, Oct 29, 2008 at 2:12 AM, Otis Gospodnetic <
> >>>>>>> otis_gospodnetic@yahoo.com> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>> So what is the "recipe" for avoiding NN SPOF using only what comes with
> >>>>>>>> Hadoop?
> >>>>>>>>
> >>>>>>>> From what I can tell, I think one has to do the following two things:
> >>>>>>>>
> >>>>>>>> 1) configure primary NN to save namespace and xa logs to multiple dirs, 
> >>>> one
> >>>>>>>> of which is actually on a remotely mounted disk, so that the data actually
> >>>>>>>> lives on a separate disk on a separate box.  This saves namespace and xa
> >>>>>>>> logs on multiple boxes in case of primary NN hardware failure.
> >>>>>>>>
> >>>>>>>> 2) configure secondary NN to periodically merge fsimage+edits and create
> >>>>>>>> the fsimage checkpoint.  This really is a second NN process running on
> >>>>>>>> another box.  It sounds like this secondary NN has to somehow have access 
> >>>> to
> >>>>>>>> fsimage & edits files from the primary NN server.
> >>>>>>>>
> >>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodedoes 
> >>>> not describe the best practise around that - the recommended way to
> >>>>>>>> give secondary NN access to primary NN's fsimage and edits files.  Should
> >>>>>>>> one mount a disk from the primary NN box to the secondary NN box to get
> >>>>>>>> access to those files?  Or is there a simpler way?
> >>>>>>>> In any case, this checkpoint is just a merge of fsimage+edits files and
> >>>>>>>> again is there in case the box with the primary NN dies.  That's what's
> >>>>>>>> described on
> >>>>>>>>
> >>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodemore 
> >>>> or less.
> >>>>>>>> Is this sufficient, or are there other things one has to do to eliminate 
> >>>> NN
> >>>>>>>> SPOF?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Otis
> >>>>>>>> --
> >>>>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ----- Original Message ----
> >>>>>>>>> From: Jean-Daniel Cryans 
> >>>>>>>>> To: core-user@hadoop.apache.org
> >>>>>>>>> Sent: Tuesday, October 28, 2008 8:14:44 PM
> >>>>>>>>> Subject: Re: SecondaryNameNode on separate machine
> >>>>>>>>>
> >>>>>>>>> Tomislav.
> >>>>>>>>>
> >>>>>>>>> Contrary to popular belief the secondary namenode does not provide
> >>>>>>>> failover,
> >>>>>>>>> it's only used to do what is described here :
> >>>>>>>>>
> >>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode
> >>>>>>>>> So the term "secondary" does not mean "a second one" but is more like "a
> >>>>>>>>> second part of".
> >>>>>>>>>
> >>>>>>>>> J-D
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 28, 2008 at 9:44 AM, Tomislav Poljak wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> I'm trying to implement NameNode failover (or at least NameNode local
> >>>>>>>>>> data backup), but it is hard since there is no official documentation.
> >>>>>>>>>> Pages on this subject are created, but still empty:
> >>>>>>>>>>
> >>>>>>>>>> http://wiki.apache.org/hadoop/NameNodeFailover
> >>>>>>>>>> http://wiki.apache.org/hadoop/SecondaryNameNode
> >>>>>>>>>>
> >>>>>>>>>> I have been browsing the web and hadoop mailing list to see how this
> >>>>>>>>>> should be implemented, but I got even more confused. People are asking
> >>>>>>>>>> do we even need SecondaryNameNode etc. (since NameNode can write local
> >>>>>>>>>> data to multiple locations, so one of those locations can be a mounted
> >>>>>>>>>> disk from other machine). I think I understand the motivation for
> >>>>>>>>>> SecondaryNameNode (to create a snapshoot of NameNode data every n
> >>>>>>>>>> seconds/hours), but setting (deploying and running) SecondaryNameNode
> >>>>>>>> on
> >>>>>>>>>> different machine than NameNode is not as trivial as I expected. First
> >>>>>>>> I
> >>>>>>>>>> found that if I need to run SecondaryNameNode on other machine than
> >>>>>>>>>> NameNode I should change masters file on NameNode (change localhost to
> >>>>>>>>>> SecondaryNameNode host) and set some properties in hadoop-site.xml on
> >>>>>>>>>> SecondaryNameNode (fs.default.name, fs.checkpoint.dir,
> >>>>>>>>>> fs.checkpoint.period etc.)
> >>>>>>>>>>
> >>>>>>>>>> This was enough to start SecondaryNameNode when starting NameNode with
> >>>>>>>>>> bin/start-dfs.sh , but it didn't create image on SecondaryNameNode.
> >>>>>>>> Then
> >>>>>>>>>> I found that I need to set dfs.http.address on NameNode address (so now
> >>>>>>>>>> I have NameNode address in both fs.default.name and dfs.http.address).
> >>>>>>>>>>
> >>>>>>>>>> Now I get following exception:
> >>>>>>>>>>
> >>>>>>>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - Exception in
> >>>>>>>>>> doCheckpoint:
> >>>>>>>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary -
> >>>>>>>>>> java.net.SocketException: Unexpected end of file from server
> >>>>>>>>>>
> >>>>>>>>>> My questions are following:
> >>>>>>>>>> How to resolve this problem (this exception)?
> >>>>>>>>>> Do I need additional property in SecondaryNameNode's hadoop-site.xml or
> >>>>>>>>>> NameNode's hadoop-site.xml?
> >>>>>>>>>>
> >>>>>>>>>> How should NameNode failover work ideally? Is it like this:
> >>>>>>>>>>
> >>>>>>>>>> SecondaryNameNode runs on separate machine than NameNode and stores
> >>>>>>>>>> NameNode's data (fsimage and fsiedits) locally in fs.checkpoint.dir.
> >>>>>>>>>> When NameNode machine crashes, we start NameNode on machine where
> >>>>>>>>>> SecondaryNameNode was running and we set dfs.name.dir to
> >>>>>>>>>> fs.checkpoint.dir. Also we need to change how DNS resolves NameNode
> >>>>>>>>>> hostname (change from the primary to the secondary).
> >>>>>>>>>>
> >>>>>>>>>> Is this correct ?
> >>>>>>>>>>
> >>>>>>>>>> Tomislav
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>
> > 
> >