Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of randysch@comcast.net
 designates 76.96.62.32 as permitted sender)
Message-ID: <50F6039D.5050201@comcast.net>
Date: Tue, 15 Jan 2013 20:34:21 -0500
From: randy <randysch@comcast.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
To: user@hadoop.apache.org
Subject: Re: hadoop namenode recovery
References: 
 <CA+NDPecJFM5yOfLjG404=c8EJnO3hGeDChUD3iOmrs+vbe_Yww@mail.gmail.com>
 <CAOcnVr02EzHw4K2XEyDbMVtue=x+ne_NjMtC+HrTj0LbxG8MBg@mail.gmail.com>
In-Reply-To: 
 <CAOcnVr02EzHw4K2XEyDbMVtue=x+ne_NjMtC+HrTj0LbxG8MBg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

What happens to the NN and/or performance if there's a problem with the 
NFS server? Or the network?

Thanks,
randy

On 01/14/2013 11:36 PM, Harsh J wrote:
> Its very rare to observe an NN crash due to a software bug in
> production. Most of the times its a hardware fault you should worry about.
>
> On 1.x, or any non-HA-carrying release, the best you can get to
> safeguard against a total loss is to have redundant disk volumes
> configured, one preferably over a dedicated remote NFS mount. This way
> the NN is recoverable after the node goes down, since you can retrieve a
> current copy from another machine (i.e. via the NFS mount) and set a new
> node up to replace the older NN and continue along.
>
> A load balancer will not work as the NN is not a simple webserver - it
> maintains state which you cannot sync. We wrote HA-HDFS features to
> address the very concern you have.
>
> If you want true, painless HA, branch-2 is your best bet at this point.
> An upcoming 2.0.3 release should include the QJM based HA features that
> is painless to setup and very reliable to use (over other options), and
> works with commodity level hardware. FWIW, we've (my team and I) been
> supporting several users and customers who're running the 2.x based HA
> in production and other types of environments and it has been greatly
> stable in our experience. There are also some folks in the community
> running 2.x based HDFS for HA/else.
>
>
> On Tue, Jan 15, 2013 at 6:55 AM, Panshul Whisper <ouchwhisper@gmail.com
> <mailto:ouchwhisper@gmail.com>> wrote:
>
>     Hello,
>
>     Is there a standard way to prevent the failure of Namenode crash in
>     a Hadoop cluster?
>     or what is the standard or best practice for overcoming the Single
>     point failure problem of Hadoop.
>
>     I am not ready to take chances on a production server with Hadoop
>     2.0 Alpha release, which claims to have solved the problem. Are
>     there any other things I can do to either prevent the failure or
>     recover from the failure in a very short time.
>
>     Thanking You,
>
>     --
>     Regards,
>     Ouch Whisper
>     010101010101
>
>
>
>
> --
> Harsh J