ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anandha L Ranganathan <analog.s...@gmail.com>
Subject Re: NameNode HA -Blueprints - Standby NN failed and Active NN created
Date Wed, 26 Aug 2015 06:29:15 GMT
+ dev group.


This is what I found in the /var/lib/ambari-agent/data/command-#.json in
the one of the master host.
In this you can see the , the active namenode is substituted by FQDN but
not the the standby node. Is this a bug in the Ambari  version.

I am using *Ambari 2.1* version.

  hadoop-env{

            "dfs_ha_initial_namenode_active": "usw2ha3dpma01.local",
            "hadoop_root_logger": "INFO,RFA",
            "dfs_ha_initial_namenode_standby":
"%HOSTGROUP::host_group_master_2%",
            "namenode_opt_permsize": "128m"
}


Thanks
Anand


On Tue, Aug 25, 2015 at 11:23 AM Anandha L Ranganathan <
analog.sony@gmail.com> wrote:

>
> Hi
>
> I am trying to install Active Namenode HA using blueprints.
> During the cluster creation through scripts, it does  following and
> completes.
>
> 1) Journal nodes starts and initialized (formats journal node).
> 2) Initialization the HA state in zookeeper  or ZKFC ( Both in Active and
> Standby namenode )
> After 96% it fails.    I logged into the cluster using UI and re-started
> the standby namenode. But it throw the exception saying that Namenode not
> formatted.
> I have to manually copy the fsimage logs from using this command, "hdfs
> namenode -bootstrapStandby -force " in the standby NN server.
> and re-starting the namenode  works fine and  goes into standby mode.
>
> Is it something I am missing in the configuration ?
> My Namenode HA blue prints looks like this.
>
> hadoop-env{
>  "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%"
> "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2"
> }
>
>
> hadoop-ev{
>
>         "dfs_ha_initial_namenode_active":
> "%HOSTGROUP::host_group_master_1%"
>         "dfs_ha_initial_namenode_standby":
> "%HOSTGROUP::host_group_master_2"
> }
>
> hdfs-site{
>           "dfs.client.failover.proxy.provider.dfs-nameservices":
> "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
>           "dfs.ha.automatic-failover.enabled": "true",
>           "dfs.ha.fencing.methods": "shell(/bin/true)",
>           "dfs.ha.namenodes.dfs-nameservices": "nn1,nn2",
>           "dfs.namenode.http-address.dfs-nameservices.nn1":
> "%HOSTGROUP::host_group_master_1%:50070",
>           "dfs.namenode.http-address.dfs-nameservices.nn2":
> "%HOSTGROUP::host_group_master_2%:50070",
>           "dfs.namenode.https-address.dfs-nameservices.nn1":
> "%HOSTGROUP::host_group_master_1%:50470",
>           "dfs.namenode.https-address.dfs-nameservices.nn2":
> "%HOSTGROUP::host_group_master_2%:50470",
>           "dfs.namenode.rpc-address.dfs-nameservices.nn1":
> "%HOSTGROUP::host_group_master_1%:8020",
>           "dfs.namenode.rpc-address.dfs-nameservices.nn2":
> "%HOSTGROUP::host_group_master_2%:8020",
>           "dfs.namenode.shared.edits.dir":
> "qjournal://%HOSTGROUP::host_group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host_group_master_3%:8485/dfs-nameservices",
>           "dfs.nameservices": "dfs-nameservices"
>
> }
>
>
> core-site{
>           "fs.defaultFS": "hdfs://dfs-nameservices",
>           "ha.zookeeper.quorum":
> "%HOSTGROUP::host_group_master_1%:2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3%:2181"
>
> }
>
>
>
> This is the log message of Standby Namenode server.
>
> 2015-08-25 08:26:26,373 INFO  zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/usr/hdp/2.2.6.0-2800/hadoop
> 2015-08-25 08:26:26,380 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(438)) - Initiating client connection,
> connectString=usw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.local:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5b7a5baa
> 2015-08-25 08:26:26,399 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to
> server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2015-08-25 08:26:26,405 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:primeConnection(852)) - Socket connection established to
> usw2ha2dpma02.local/172.17.213.51:2181, initiating session
> 2015-08-25 08:26:26,413 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:onConnected(1235)) - Session establishment complete on
> server usw2ha2dpma02.local/172.17.213.51:2181, sessionid =
> 0x24f63f6f3050001, negotiated timeout = 5000
> 2015-08-25 08:26:26,416 INFO  ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(547)) - Session connected.
> 2015-08-25 08:26:26,441 INFO  ipc.CallQueueManager
> (CallQueueManager.java:<init>(53)) - Using callQueue class
> java.util.concurrent.LinkedBlockingQueue
> 2015-08-25 08:26:26,472 INFO  ipc.Server (Server.java:run(605)) - Starting
> Socket Reader #1 for port 8019
> 2015-08-25 08:26:26,520 INFO  ipc.Server (Server.java:run(827)) - IPC
> Server Responder: starting
> 2015-08-25 08:26:26,526 INFO  ipc.Server (Server.java:run(674)) - IPC
> Server listener on 8019: starting
> 2015-08-25 08:26:27,596 INFO  ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:27,615 WARN  ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020:
> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020
> failed on connection exception: java.net.ConnectException: Connection
> refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> 2015-08-25 08:26:27,616 INFO  ha.HealthMonitor
> (HealthMonitor.java:enterState(238)) - Entering state SERVICE_NOT_RESPONDING
> 2015-08-25 08:26:27,616 INFO  ha.ZKFailoverController
> (ZKFailoverController.java:setLastHealthState(850)) - Local service
> NameNode at usw2ha2dpma02.local/172.17.213.51:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2015-08-25 08:26:27,616 INFO  ha.ZKFailoverController
> (ZKFailoverController.java:recheckElectability(766)) - Quitting master
> election for NameNode at usw2ha2dpma02.local/172.17.213.51:8020 and
> marking that fencing is necessary
> 2015-08-25 08:26:27,617 INFO  ha.ActiveStandbyElector
> (ActiveStandbyElector.java:quitElection(354)) - Yielding from election
> 2015-08-25 08:26:27,621 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
> 2015-08-25 08:26:27,621 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x24f63f6f3050001 closed
> 2015-08-25 08:26:29,623 INFO  ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:29,624 WARN  ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020:
> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020
> failed on connection exception: java.net.ConnectException: Connection
> refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> 2015-08-25 08:26:31,626 INFO  ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:31,627 WARN  ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020:
> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020
> failed on connection exception: java.net.ConnectException: Connection
> refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> 2015-08-25 08:26:33,629 INFO  ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:33,630 WARN  ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message