ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Fernandez <>
Subject Re: Recovering from a dead master namenode server
Date Thu, 03 Mar 2016 19:00:46 GMT
The situation is the same regardless of master/slave/client.

Basically, you have to
bring up a host with the same FQDN
install ambari-agent on it

At this point, any components that used to be on that host will report heartbeat lost and
the cluster may not be fully operational if it contained masters (especially NameNode).
You may then have to restart services on that host, which will actually end up installing
the bits again and generating configs.
The hard part is that you may have to run additional commands depending on the type of master,
think of NameNode or even hosts that contain databases for Hive, Oozie, etc.

Attempting to move masters may be complicated because it may require the original host to
be heartbeating and with the bits installed in order to be able to stop the services Ambari
knows about.


From: cs user <<>>
Reply-To: "<>" <<>>
Date: Thursday, March 3, 2016 at 5:00 AM
To: "<>" <<>>
Subject: Recovering from a dead master namenode server

Hi All,

I'm trying to understand how to recover from certain failures within Ambari. When launching
within a cloud environment, it's possible that a host may be completed deleted, and you won't
have the chance to decommission the node.

For example, in the event that the server hosting the master hdfs namenode was lost, would
it be possible to spin up another server in its place, built completely from scratch and have
this replace the old namenode master?

Currently when I attempt to delete a failed host, it warns me that the following components
need to be moved:

NameNode, Spark History Server

It also then tries to talk me through the process of copying data from the old namenode to
the new namenode. If the server has been deleted, this would not be possible. Would it be
possible to copy this data from the secondary namenode instead?

Many thanks in advance.


View raw message