Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 56593 invoked from network); 4 Nov 2008 13:06:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Nov 2008 13:06:11 -0000 Received: (qmail 65298 invoked by uid 500); 4 Nov 2008 13:06:11 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 65250 invoked by uid 500); 4 Nov 2008 13:06:11 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 65224 invoked by uid 99); 4 Nov 2008 13:06:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Nov 2008 05:06:11 -0800 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tpoljak@gmail.com designates 72.14.220.152 as permitted sender) Received: from [72.14.220.152] (HELO fg-out-1718.google.com) (72.14.220.152) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Nov 2008 13:04:51 +0000 Received: by fg-out-1718.google.com with SMTP id l26so2624091fgb.35 for ; Tue, 04 Nov 2008 05:05:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:in-reply-to :references:content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; bh=IPby4cVXvsBeJqn991RsDhHZqMk84Wt5slsPMMU8zXQ=; b=pMYla/hjBaAAz2fhCK1W3OWpllogtOROV1htqoiw17+rRgf4o8QrLqq7nbn9nOdQnp 3yIIUx9ZVCy/r8623NpSgVSXsL0fpNK8yZ82AOkPXOqoY5IlLmMCSB4Eh7CJtIEM1fP0 rJ17CLeI2RCshdGq8k9tfVJOJfL34u4ocL7OY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:in-reply-to:references:content-type:date:message-id :mime-version:x-mailer:content-transfer-encoding; b=xmY48bixKYP2GsIIjk6IxdZaYmitw75UMSRtWzmwJAdkB9XG8wXdcoO5wicObW8IkJ Jn5JrKEyaf/XnQRmoTYkUGd+nDYgGkHTJEfsOdj2o3Uv5apQ0RVvvy5vJ3OSZjpOZXZY zOOwHF6gdsmqtciWjHxCnXAquAqzgvD5/tggw= Received: by 10.181.149.19 with SMTP id b19mr339937bko.82.1225803921831; Tue, 04 Nov 2008 05:05:21 -0800 (PST) Received: from ?192.168.1.2? (78-3-216-149.adsl.net.t-com.hr [78.3.216.149]) by mx.google.com with ESMTPS id 28sm12294283fkx.1.2008.11.04.05.05.18 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 04 Nov 2008 05:05:20 -0800 (PST) Subject: Re: SecondaryNameNode on separate machine From: Tomislav Poljak To: core-user@hadoop.apache.org In-Reply-To: <490F4D64.6020103@yahoo-inc.com> References: <1225201444.6502.67.camel@tpoljak-laptop> <31a243e70810281714s40fab599n7a8a15882eb297de@mail.gmail.com> <996579.53071.qm@web50310.mail.re2.yahoo.com> <31a243e70810290608j21aa6e03v8bf1c4dd49bdeea2@mail.gmail.com> <1225317786.6500.101.camel@tpoljak-laptop> <4908ED74.2070807@yahoo-inc.com> <1225389138.13755.6.camel@tpoljak-laptop> <972612.310.qm@web50311.mail.re2.yahoo.com> <490B5096.7050906@yahoo-inc.com> <1225717848.6545.24.camel@tpoljak-laptop> <490F4D64.6020103@yahoo-inc.com> Content-Type: text/plain Date: Tue, 04 Nov 2008 14:05:19 +0100 Message-Id: <1225803919.6565.3.camel@tpoljak-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Konstantin, it works, thanks a lot! Tomislav On Mon, 2008-11-03 at 11:13 -0800, Konstantin Shvachko wrote: > You can either do what you just described with dfs.name.dir = dirX > or you can start name-node with -importCheckpoint option. > This is an automation for copying image files from secondary to primary. > > See here: > http://hadoop.apache.org/core/docs/current/commands_manual.html#namenode > http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode > http://issues.apache.org/jira/browse/HADOOP-2585#action_12584755 > > --Konstantin > > Tomislav Poljak wrote: > > Hi, > > Thank you all for your time and your answers! > > > > Now SecondaryNameNode connects to the NameNode (after I configured > > dfs.http.address to the NN's http server -> NN hostname on port 50070) > > and creates(transfers) edits and fsimage from NameNode. > > > > Can you explain me a little bit more how NameNode failover should work > > now? > > > > For example, SecondaryNameNode now stores fsimage and edits to (SNN's) > > dirX and let's say NameNode goes down (disk becomes unreadable). Now I > > create/dedicate a new machine for NameNode (also change DNS to point to > > this new NameNode machine as nameNode host) and take the data dirX from > > SNN and copy it to new NameNode. How do I configure new NameNode to use > > data from dirX (do I configure dfs.name.dir to point to dirX and start > > new NameNode)? > > > > Thanks, > > Tomislav > > > > > > > > On Fri, 2008-10-31 at 11:38 -0700, Konstantin Shvachko wrote: > >> True, dfs.http.address is the NN Web UI address. > >> This where the NN http server runs. Besides the Web UI there also > >> a servlet running on that server which is used to transfer image > >> and edits from NN to the secondary using http get. > >> So SNN uses both addresses fs.default.name and dfs.http.address. > >> > >> When SNN finishes the checkpoint the primary needs to transfer the > >> resulting image back. This is done via the http server running on SNN. > >> > >> Answering Tomislav's question: > >> The difference between fs.default.name and dfs.http.address is that > >> fs.default.name is the name-node's PRC address, where clients and > >> data-nodes connect to, while dfs.http.address is the NN's http server > >> address where our browsers connect to, but it is also used for > >> transferring image and edits files. > >> > >> --Konstantin > >> > >> Otis Gospodnetic wrote: > >>> Konstantin & Co, please correct me if I'm wrong, but looking at hadoop-default.xml makes me think that dfs.http.address is only the URL for the NN *Web UI*. In other words, this is where we people go look at the NN. > >>> > >>> The secondary NN must then be using only the Primary NN URL specified in fs.default.name. This URL looks like hdfs://name-node-hostname-here/. Something in Hadoop then knows the exact port for the Primary NN based on the URI schema (e.g. "hdfs://") in this URL. > >>> > >>> Is this correct? > >>> > >>> > >>> Thanks, > >>> Otis > >>> -- > >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >>> > >>> > >>> > >>> ----- Original Message ---- > >>>> From: Tomislav Poljak > >>>> To: core-user@hadoop.apache.org > >>>> Sent: Thursday, October 30, 2008 1:52:18 PM > >>>> Subject: Re: SecondaryNameNode on separate machine > >>>> > >>>> Hi, > >>>> can you, please, explain the difference between fs.default.name and > >>>> dfs.http.address (like how and when is SecondaryNameNode using > >>>> fs.default.name and how/when dfs.http.address). I have set them both to > >>>> same (namenode's) hostname:port. Is this correct (or dfs.http.address > >>>> needs some other port)? > >>>> > >>>> Thanks, > >>>> > >>>> Tomislav > >>>> > >>>> On Wed, 2008-10-29 at 16:10 -0700, Konstantin Shvachko wrote: > >>>>> SecondaryNameNode uses http protocol to transfer the image and the edits > >>>>> from the primary name-node and vise versa. > >>>>> So the secondary does not access local files on the primary directly. > >>>>> The primary NN should know the secondary's http address. > >>>>> And the secondary NN need to know both fs.default.name and dfs.http.address of > >>>> the primary. > >>>>> In general we usually create one configuration file hadoop-site.xml > >>>>> and copy it to all other machines. So you don't need to set up different > >>>>> values for all servers. > >>>>> > >>>>> Regards, > >>>>> --Konstantin > >>>>> > >>>>> Tomislav Poljak wrote: > >>>>>> Hi, > >>>>>> I'm not clear on how does SecondaryNameNode communicates with NameNode > >>>>>> (if deployed on separate machine). Does SecondaryNameNode uses direct > >>>>>> connection (over some port and protocol) or is it enough for > >>>>>> SecondaryNameNode to have access to data which NameNode writes locally > >>>>>> on disk? > >>>>>> > >>>>>> Tomislav > >>>>>> > >>>>>> On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote: > >>>>>>> I think a lot of the confusion comes from this thread : > >>>>>>> http://www.nabble.com/NameNode-failover-procedure-td11711842.html > >>>>>>> > >>>>>>> Particularly because the wiki was updated with wrong information, not > >>>>>>> maliciously I'm sure. This information is now gone for good. > >>>>>>> > >>>>>>> Otis, your solution is pretty much like the one given by Dhruba Borthakur > >>>>>>> and augmented by Konstantin Shvachko later in the thread but I never did it > >>>>>>> myself. > >>>>>>> > >>>>>>> One thing should be clear though, the NN is and will remain a SPOF (just > >>>>>>> like HBase's Master) as long as a distributed manager service (like > >>>>>>> Zookeeper) is not plugged into Hadoop to help with failover. > >>>>>>> > >>>>>>> J-D > >>>>>>> > >>>>>>> On Wed, Oct 29, 2008 at 2:12 AM, Otis Gospodnetic < > >>>>>>> otis_gospodnetic@yahoo.com> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> So what is the "recipe" for avoiding NN SPOF using only what comes with > >>>>>>>> Hadoop? > >>>>>>>> > >>>>>>>> From what I can tell, I think one has to do the following two things: > >>>>>>>> > >>>>>>>> 1) configure primary NN to save namespace and xa logs to multiple dirs, > >>>> one > >>>>>>>> of which is actually on a remotely mounted disk, so that the data actually > >>>>>>>> lives on a separate disk on a separate box. This saves namespace and xa > >>>>>>>> logs on multiple boxes in case of primary NN hardware failure. > >>>>>>>> > >>>>>>>> 2) configure secondary NN to periodically merge fsimage+edits and create > >>>>>>>> the fsimage checkpoint. This really is a second NN process running on > >>>>>>>> another box. It sounds like this secondary NN has to somehow have access > >>>> to > >>>>>>>> fsimage & edits files from the primary NN server. > >>>>>>>> > >>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodedoes > >>>> not describe the best practise around that - the recommended way to > >>>>>>>> give secondary NN access to primary NN's fsimage and edits files. Should > >>>>>>>> one mount a disk from the primary NN box to the secondary NN box to get > >>>>>>>> access to those files? Or is there a simpler way? > >>>>>>>> In any case, this checkpoint is just a merge of fsimage+edits files and > >>>>>>>> again is there in case the box with the primary NN dies. That's what's > >>>>>>>> described on > >>>>>>>> > >>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodemore > >>>> or less. > >>>>>>>> Is this sufficient, or are there other things one has to do to eliminate > >>>> NN > >>>>>>>> SPOF? > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Otis > >>>>>>>> -- > >>>>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> ----- Original Message ---- > >>>>>>>>> From: Jean-Daniel Cryans > >>>>>>>>> To: core-user@hadoop.apache.org > >>>>>>>>> Sent: Tuesday, October 28, 2008 8:14:44 PM > >>>>>>>>> Subject: Re: SecondaryNameNode on separate machine > >>>>>>>>> > >>>>>>>>> Tomislav. > >>>>>>>>> > >>>>>>>>> Contrary to popular belief the secondary namenode does not provide > >>>>>>>> failover, > >>>>>>>>> it's only used to do what is described here : > >>>>>>>>> > >>>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode > >>>>>>>>> So the term "secondary" does not mean "a second one" but is more like "a > >>>>>>>>> second part of". > >>>>>>>>> > >>>>>>>>> J-D > >>>>>>>>> > >>>>>>>>> On Tue, Oct 28, 2008 at 9:44 AM, Tomislav Poljak wrote: > >>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> I'm trying to implement NameNode failover (or at least NameNode local > >>>>>>>>>> data backup), but it is hard since there is no official documentation. > >>>>>>>>>> Pages on this subject are created, but still empty: > >>>>>>>>>> > >>>>>>>>>> http://wiki.apache.org/hadoop/NameNodeFailover > >>>>>>>>>> http://wiki.apache.org/hadoop/SecondaryNameNode > >>>>>>>>>> > >>>>>>>>>> I have been browsing the web and hadoop mailing list to see how this > >>>>>>>>>> should be implemented, but I got even more confused. People are asking > >>>>>>>>>> do we even need SecondaryNameNode etc. (since NameNode can write local > >>>>>>>>>> data to multiple locations, so one of those locations can be a mounted > >>>>>>>>>> disk from other machine). I think I understand the motivation for > >>>>>>>>>> SecondaryNameNode (to create a snapshoot of NameNode data every n > >>>>>>>>>> seconds/hours), but setting (deploying and running) SecondaryNameNode > >>>>>>>> on > >>>>>>>>>> different machine than NameNode is not as trivial as I expected. First > >>>>>>>> I > >>>>>>>>>> found that if I need to run SecondaryNameNode on other machine than > >>>>>>>>>> NameNode I should change masters file on NameNode (change localhost to > >>>>>>>>>> SecondaryNameNode host) and set some properties in hadoop-site.xml on > >>>>>>>>>> SecondaryNameNode (fs.default.name, fs.checkpoint.dir, > >>>>>>>>>> fs.checkpoint.period etc.) > >>>>>>>>>> > >>>>>>>>>> This was enough to start SecondaryNameNode when starting NameNode with > >>>>>>>>>> bin/start-dfs.sh , but it didn't create image on SecondaryNameNode. > >>>>>>>> Then > >>>>>>>>>> I found that I need to set dfs.http.address on NameNode address (so now > >>>>>>>>>> I have NameNode address in both fs.default.name and dfs.http.address). > >>>>>>>>>> > >>>>>>>>>> Now I get following exception: > >>>>>>>>>> > >>>>>>>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - Exception in > >>>>>>>>>> doCheckpoint: > >>>>>>>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - > >>>>>>>>>> java.net.SocketException: Unexpected end of file from server > >>>>>>>>>> > >>>>>>>>>> My questions are following: > >>>>>>>>>> How to resolve this problem (this exception)? > >>>>>>>>>> Do I need additional property in SecondaryNameNode's hadoop-site.xml or > >>>>>>>>>> NameNode's hadoop-site.xml? > >>>>>>>>>> > >>>>>>>>>> How should NameNode failover work ideally? Is it like this: > >>>>>>>>>> > >>>>>>>>>> SecondaryNameNode runs on separate machine than NameNode and stores > >>>>>>>>>> NameNode's data (fsimage and fsiedits) locally in fs.checkpoint.dir. > >>>>>>>>>> When NameNode machine crashes, we start NameNode on machine where > >>>>>>>>>> SecondaryNameNode was running and we set dfs.name.dir to > >>>>>>>>>> fs.checkpoint.dir. Also we need to change how DNS resolves NameNode > >>>>>>>>>> hostname (change from the primary to the secondary). > >>>>>>>>>> > >>>>>>>>>> Is this correct ? > >>>>>>>>>> > >>>>>>>>>> Tomislav > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>> > > > >