hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From randy <randy...@comcast.net>
Subject Re: hadoop namenode recovery
Date Wed, 16 Jan 2013 01:34:21 GMT
What happens to the NN and/or performance if there's a problem with the 
NFS server? Or the network?

Thanks,
randy

On 01/14/2013 11:36 PM, Harsh J wrote:
> Its very rare to observe an NN crash due to a software bug in
> production. Most of the times its a hardware fault you should worry about.
>
> On 1.x, or any non-HA-carrying release, the best you can get to
> safeguard against a total loss is to have redundant disk volumes
> configured, one preferably over a dedicated remote NFS mount. This way
> the NN is recoverable after the node goes down, since you can retrieve a
> current copy from another machine (i.e. via the NFS mount) and set a new
> node up to replace the older NN and continue along.
>
> A load balancer will not work as the NN is not a simple webserver - it
> maintains state which you cannot sync. We wrote HA-HDFS features to
> address the very concern you have.
>
> If you want true, painless HA, branch-2 is your best bet at this point.
> An upcoming 2.0.3 release should include the QJM based HA features that
> is painless to setup and very reliable to use (over other options), and
> works with commodity level hardware. FWIW, we've (my team and I) been
> supporting several users and customers who're running the 2.x based HA
> in production and other types of environments and it has been greatly
> stable in our experience. There are also some folks in the community
> running 2.x based HDFS for HA/else.
>
>
> On Tue, Jan 15, 2013 at 6:55 AM, Panshul Whisper <ouchwhisper@gmail.com
> <mailto:ouchwhisper@gmail.com>> wrote:
>
>     Hello,
>
>     Is there a standard way to prevent the failure of Namenode crash in
>     a Hadoop cluster?
>     or what is the standard or best practice for overcoming the Single
>     point failure problem of Hadoop.
>
>     I am not ready to take chances on a production server with Hadoop
>     2.0 Alpha release, which claims to have solved the problem. Are
>     there any other things I can do to either prevent the failure or
>     recover from the failure in a very short time.
>
>     Thanking You,
>
>     --
>     Regards,
>     Ouch Whisper
>     010101010101
>
>
>
>
> --
> Harsh J


Mime
View raw message