hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <brahmareddy.batt...@huawei.com>
Subject RE: Failed to active namenode when config HA
Date Tue, 30 Sep 2014 04:04:04 GMT
You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and
the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination
data, notifying clients of changes in that data, and monitoring clients for failures. The
implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent
session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying
the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect
a node as active. If the current active NameNode crashes, another node may take a special
exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also
monitors and manages the state of the NameNode. Each of the machines which runs a NameNode
also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check
command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC
considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy
state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a
session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock"
znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the
lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that
no other node currently holds the lock znode, it will itself try to acquire the lock. If it
succeeds, then it has "won the election", and is responsible for running a failover to make
its local NameNode active. The failover process is similar to the manual failover described
above: first, the previous active is fenced if necessary, and then the local NameNode transitions
to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties
is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices
in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine
the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4
for nameservices2. I can start these namenodes successfully and the namenodes are all in standby
state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<matt.narrell@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<user@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and
standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <475053586@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want
launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode
on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by
running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>
Mime
View raw message