Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=message-id:date:from:user-agent:mime-version:to:subject:
	references:in-reply-to:content-type:content-transfer-encoding;
	b=zhQhIYyyFW3fpaLLlUD3KSzBN/mG7GZVjRY4ktn7vqR7MfAoEQjfYiFO9upJ6JOk
Message-ID: <4908ED74.2070807@yahoo-inc.com>
Date: Wed, 29 Oct 2008 16:10:44 -0700
From: Konstantin Shvachko <shv@yahoo-inc.com>
User-Agent: Thunderbird 2.0.0.17 (Windows/20080914)
MIME-Version: 1.0
To: core-user@hadoop.apache.org
Subject: Re: SecondaryNameNode on separate machine
References: <1225201444.6502.67.camel@tpoljak-laptop>
	 <31a243e70810281714s40fab599n7a8a15882eb297de@mail.gmail.com>
	 <996579.53071.qm@web50310.mail.re2.yahoo.com>
	 <31a243e70810290608j21aa6e03v8bf1c4dd49bdeea2@mail.gmail.com>
 <1225317786.6500.101.camel@tpoljak-laptop>
In-Reply-To: <1225317786.6500.101.camel@tpoljak-laptop>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

SecondaryNameNode uses http protocol to transfer the image and the edits
from the primary name-node and vise versa.
So the secondary does not access local files on the primary directly.
The primary NN should know the secondary's http address.
And the secondary NN need to know both fs.default.name and dfs.http.address of the primary.

In general we usually create one configuration file hadoop-site.xml
and copy it to all other machines. So you don't need to set up different
values for all servers.

Regards,
--Konstantin

Tomislav Poljak wrote:
> Hi,
> I'm not clear on how does SecondaryNameNode communicates with NameNode
> (if deployed on separate machine). Does SecondaryNameNode uses direct
> connection (over some port and protocol) or is it enough for
> SecondaryNameNode to have access to data which NameNode writes locally
> on disk?
> 
> Tomislav
> 
> On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote:
>> I think a lot of the confusion comes from this thread :
>> http://www.nabble.com/NameNode-failover-procedure-td11711842.html
>>
>> Particularly because the wiki was updated with wrong information, not
>> maliciously I'm sure. This information is now gone for good.
>>
>> Otis, your solution is pretty much like the one given by Dhruba Borthakur
>> and augmented by Konstantin Shvachko later in the thread but I never did it
>> myself.
>>
>> One thing should be clear though, the NN is and will remain a SPOF (just
>> like HBase's Master) as long as a distributed manager service (like
>> Zookeeper) is not plugged into Hadoop to help with failover.
>>
>> J-D
>>
>> On Wed, Oct 29, 2008 at 2:12 AM, Otis Gospodnetic <
>> otis_gospodnetic@yahoo.com> wrote:
>>
>>> Hi,
>>> So what is the "recipe" for avoiding NN SPOF using only what comes with
>>> Hadoop?
>>>
>>> From what I can tell, I think one has to do the following two things:
>>>
>>> 1) configure primary NN to save namespace and xa logs to multiple dirs, one
>>> of which is actually on a remotely mounted disk, so that the data actually
>>> lives on a separate disk on a separate box.  This saves namespace and xa
>>> logs on multiple boxes in case of primary NN hardware failure.
>>>
>>> 2) configure secondary NN to periodically merge fsimage+edits and create
>>> the fsimage checkpoint.  This really is a second NN process running on
>>> another box.  It sounds like this secondary NN has to somehow have access to
>>> fsimage & edits files from the primary NN server.
>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodedoes not describe the best practise around that - the recommended way to
>>> give secondary NN access to primary NN's fsimage and edits files.  Should
>>> one mount a disk from the primary NN box to the secondary NN box to get
>>> access to those files?  Or is there a simpler way?
>>> In any case, this checkpoint is just a merge of fsimage+edits files and
>>> again is there in case the box with the primary NN dies.  That's what's
>>> described on
>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodemore or less.
>>>
>>> Is this sufficient, or are there other things one has to do to eliminate NN
>>> SPOF?
>>>
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> ----- Original Message ----
>>>> From: Jean-Daniel Cryans <jdcryans@apache.org>
>>>> To: core-user@hadoop.apache.org
>>>> Sent: Tuesday, October 28, 2008 8:14:44 PM
>>>> Subject: Re: SecondaryNameNode on separate machine
>>>>
>>>> Tomislav.
>>>>
>>>> Contrary to popular belief the secondary namenode does not provide
>>> failover,
>>>> it's only used to do what is described here :
>>>>
>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode
>>>> So the term "secondary" does not mean "a second one" but is more like "a
>>>> second part of".
>>>>
>>>> J-D
>>>>
>>>> On Tue, Oct 28, 2008 at 9:44 AM, Tomislav Poljak wrote:
>>>>
>>>>> Hi,
>>>>> I'm trying to implement NameNode failover (or at least NameNode local
>>>>> data backup), but it is hard since there is no official documentation.
>>>>> Pages on this subject are created, but still empty:
>>>>>
>>>>> http://wiki.apache.org/hadoop/NameNodeFailover
>>>>> http://wiki.apache.org/hadoop/SecondaryNameNode
>>>>>
>>>>> I have been browsing the web and hadoop mailing list to see how this
>>>>> should be implemented, but I got even more confused. People are asking
>>>>> do we even need SecondaryNameNode etc. (since NameNode can write local
>>>>> data to multiple locations, so one of those locations can be a mounted
>>>>> disk from other machine). I think I understand the motivation for
>>>>> SecondaryNameNode (to create a snapshoot of NameNode data every n
>>>>> seconds/hours), but setting (deploying and running) SecondaryNameNode
>>> on
>>>>> different machine than NameNode is not as trivial as I expected. First
>>> I
>>>>> found that if I need to run SecondaryNameNode on other machine than
>>>>> NameNode I should change masters file on NameNode (change localhost to
>>>>> SecondaryNameNode host) and set some properties in hadoop-site.xml on
>>>>> SecondaryNameNode (fs.default.name, fs.checkpoint.dir,
>>>>> fs.checkpoint.period etc.)
>>>>>
>>>>> This was enough to start SecondaryNameNode when starting NameNode with
>>>>> bin/start-dfs.sh , but it didn't create image on SecondaryNameNode.
>>> Then
>>>>> I found that I need to set dfs.http.address on NameNode address (so now
>>>>> I have NameNode address in both fs.default.name and dfs.http.address).
>>>>>
>>>>> Now I get following exception:
>>>>>
>>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - Exception in
>>>>> doCheckpoint:
>>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary -
>>>>> java.net.SocketException: Unexpected end of file from server
>>>>>
>>>>> My questions are following:
>>>>> How to resolve this problem (this exception)?
>>>>> Do I need additional property in SecondaryNameNode's hadoop-site.xml or
>>>>> NameNode's hadoop-site.xml?
>>>>>
>>>>> How should NameNode failover work ideally? Is it like this:
>>>>>
>>>>> SecondaryNameNode runs on separate machine than NameNode and stores
>>>>> NameNode's data (fsimage and fsiedits) locally in fs.checkpoint.dir.
>>>>> When NameNode machine crashes, we start NameNode on machine where
>>>>> SecondaryNameNode was running and we set dfs.name.dir to
>>>>> fs.checkpoint.dir. Also we need to change how DNS resolves NameNode
>>>>> hostname (change from the primary to the secondary).
>>>>>
>>>>> Is this correct ?
>>>>>
>>>>> Tomislav
>>>>>
>>>>>
>>>>>
>>>
> 
>