hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: hadoop namenode recovery
Date Wed, 16 Jan 2013 04:14:10 GMT
The NFS mount is to be soft-mounted; so if the NFS goes down, the NN ejects
it out and continues with the local disk. If auto-restore is configured, it
will re-add the NFS if its detected good again later.


On Wed, Jan 16, 2013 at 7:04 AM, randy <randysch@comcast.net> wrote:

> What happens to the NN and/or performance if there's a problem with the
> NFS server? Or the network?
>
> Thanks,
> randy
>
>
> On 01/14/2013 11:36 PM, Harsh J wrote:
>
>> Its very rare to observe an NN crash due to a software bug in
>> production. Most of the times its a hardware fault you should worry about.
>>
>> On 1.x, or any non-HA-carrying release, the best you can get to
>> safeguard against a total loss is to have redundant disk volumes
>> configured, one preferably over a dedicated remote NFS mount. This way
>> the NN is recoverable after the node goes down, since you can retrieve a
>> current copy from another machine (i.e. via the NFS mount) and set a new
>> node up to replace the older NN and continue along.
>>
>> A load balancer will not work as the NN is not a simple webserver - it
>> maintains state which you cannot sync. We wrote HA-HDFS features to
>> address the very concern you have.
>>
>> If you want true, painless HA, branch-2 is your best bet at this point.
>> An upcoming 2.0.3 release should include the QJM based HA features that
>> is painless to setup and very reliable to use (over other options), and
>> works with commodity level hardware. FWIW, we've (my team and I) been
>> supporting several users and customers who're running the 2.x based HA
>> in production and other types of environments and it has been greatly
>> stable in our experience. There are also some folks in the community
>> running 2.x based HDFS for HA/else.
>>
>>
>> On Tue, Jan 15, 2013 at 6:55 AM, Panshul Whisper <ouchwhisper@gmail.com
>> <mailto:ouchwhisper@gmail.com>**> wrote:
>>
>>     Hello,
>>
>>     Is there a standard way to prevent the failure of Namenode crash in
>>     a Hadoop cluster?
>>     or what is the standard or best practice for overcoming the Single
>>     point failure problem of Hadoop.
>>
>>     I am not ready to take chances on a production server with Hadoop
>>     2.0 Alpha release, which claims to have solved the problem. Are
>>     there any other things I can do to either prevent the failure or
>>     recover from the failure in a very short time.
>>
>>     Thanking You,
>>
>>     --
>>     Regards,
>>     Ouch Whisper
>>     010101010101
>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>


-- 
Harsh J

Mime
View raw message