Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 55727 invoked from network); 18 Nov 2010 10:30:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Nov 2010 10:30:28 -0000 Received: (qmail 81138 invoked by uid 500); 18 Nov 2010 10:31:00 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 80903 invoked by uid 500); 18 Nov 2010 10:30:58 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 80885 invoked by uid 99); 18 Nov 2010 10:30:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Nov 2010 10:30:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gerritjvv@googlemail.com designates 209.85.161.176 as permitted sender) Received: from [209.85.161.176] (HELO mail-gx0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Nov 2010 10:30:51 +0000 Received: by gxk4 with SMTP id 4so1975215gxk.35 for ; Thu, 18 Nov 2010 02:30:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=TmyT/tDY0QET6LVrLOPPTWXI67O6stnh/KJnBccOwTk=; b=gHI2Rbmy5ZbCm80kUHwI7dgitUQnBCfauRwev4NwmZauKW/t/ONWwS0bZHXjEhCsvD o6ktDn4EGjzX47RPqNSVqDMNAm6QYf3MMySFY/uIaRdvQErqrhGsVBJ11b8yjuaboT5G 4cU7fHnSlX/5sPbAz2onICEXpDglbzuIsZKaQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=QZaqfVy/BC/Wt0D0sMwXI6Vq/Oo0DW2aXXjNvkWLDv3gwJVKxdGAk4ZZh/QSwZwS4a lIK7ftpO9oGugxler9h6ZIYq2fI+/InKJQq3jxEdKmZD/SpbwXCNeZ7LRcAq6uSysuxu kre8ae1FrhASe08HHVKRnhNVlQ1+Pb8tMMGDo= MIME-Version: 1.0 Received: by 10.42.27.71 with SMTP id i7mr5995icc.89.1290076229883; Thu, 18 Nov 2010 02:30:29 -0800 (PST) Received: by 10.42.157.202 with HTTP; Thu, 18 Nov 2010 02:30:29 -0800 (PST) In-Reply-To: References: Date: Thu, 18 Nov 2010 10:30:29 +0000 Message-ID: Subject: Re: Namenode Role From: Gerrit Jansen van Vuuren To: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=90e6ba47680d6a55d80495514752 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba47680d6a55d80495514752 Content-Type: text/plain; charset=ISO-8859-1 Hi, There is some development going on in both yahoo and facebook about making the namenode HA, but so far there is nothing released that will do this. So to answer your question: no, the namenode is a single point of failure with no possibility of switching during runtime. The only solution is to: -> write output namenode metadata to two locations: localdisk, and a ntfs mount. -> you must always run the seconday/checkpoint namenode. if not the Namenode will never merge its edits.log file into the on disk image file. (the primary namenode only merges edits.log into the image under two conditions : restart, or the secondary namenode requests a checkpoint) -> make backups with a cronned script requesting checkpoints via the namenode http api, and store these backups off rack even off site. Using another namenode when the current namenode fails is then a restore from one of the backups you've made or using one of the checkpoints made by the secondary namenode. But I can't stress enough the fact that you need to make as many backups as possible of your metadata, or else total data loss will occur if you can't recover the metadata. Hope this helps. cheers, Gerrit On Thu, Nov 18, 2010 at 3:20 AM, Ozcan ILIKHAN wrote: > Currently in my mini cluster I have one active and one backup NameNode. > Whenever I need backup NameNode to be active/regular NameNode, I shutdown it > and restart in active mode. As far as I understand from documentation and > code, there is no way to switch from backup to active role at run time. > > Does anyone have a better idea of handling this situation? > > Thanks, > Ozcan. > --90e6ba47680d6a55d80495514752 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

There is some development going on in both yahoo and facebook ab= out making the namenode HA, but so far there is nothing released that will = do this.
So to answer your question: no, the namenode is a single point = of failure with no possibility of switching during runtime.

The only solution is to:
-> write output namenode metadata to two= locations: localdisk, and a ntfs mount.
-> you must always run the s= econday/checkpoint namenode. if not the Namenode will never merge its edits= .log file into the on disk image file. (the primary namenode only merges ed= its.log into the image under two conditions : restart, or the secondary nam= enode requests a checkpoint)
-> make backups with a cronned script requesting checkpoints via the nam= enode http api, and store these backups off rack even off site.

Usin= g another namenode when the current namenode fails is then a restore from o= ne of the backups you've made or using one of the checkpoints made by t= he secondary namenode. But I can't stress enough the fact that you need= to make as many backups as possible of your metadata, or else total data l= oss will occur if you can't recover the metadata.

Hope this helps.

cheers,
=A0Gerrit

On Thu, Nov 18, 2010 at 3:20 AM, Ozcan ILIKHAN &l= t;ilikhan@cs.wisc.edu>= wrote:
Currently in my m= ini cluster I have one active and one backup NameNode. Whenever I need back= up NameNode to be active/regular NameNode, I shutdown it and restart in act= ive mode. As far as I understand from documentation and code, there is no w= ay to switch from backup to active role at run time.

Does anyone have a better idea of handling this situation?

Thanks,
Ozcan.

--90e6ba47680d6a55d80495514752--