Return-Path: X-Original-To: apmail-ambari-user-archive@www.apache.org Delivered-To: apmail-ambari-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CED0518297 for ; Tue, 25 Aug 2015 18:23:31 +0000 (UTC) Received: (qmail 86057 invoked by uid 500); 25 Aug 2015 18:23:31 -0000 Delivered-To: apmail-ambari-user-archive@ambari.apache.org Received: (qmail 86026 invoked by uid 500); 25 Aug 2015 18:23:31 -0000 Mailing-List: contact user-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ambari.apache.org Delivered-To: mailing list user@ambari.apache.org Received: (qmail 86013 invoked by uid 99); 25 Aug 2015 18:23:31 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 18:23:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1F86CEDCEB for ; Tue, 25 Aug 2015 18:23:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.88 X-Spam-Level: **** X-Spam-Status: No, score=4.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_BADIPHTTP=2, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BbgrKkVwcAVV for ; Tue, 25 Aug 2015 18:23:27 +0000 (UTC) Received: from mail-qk0-f180.google.com (mail-qk0-f180.google.com [209.85.220.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 50E8720382 for ; Tue, 25 Aug 2015 18:23:27 +0000 (UTC) Received: by qkbm65 with SMTP id m65so105152293qkb.2 for ; Tue, 25 Aug 2015 11:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=zdKikdU04Sy2opxLaBnpyNhweE5LfWQ6vRlG0c6MZ2E=; b=luaGAlOTurPmK9vpJ6ZSaw/trdSdiLlD7vENG40eqJXJRAhcY63cZqkewMxp8EFPxj knyyNBwxCh3Y3SmiQriZ3vD8r0rpVx+dorwXhaNDXRyfiXfsWuMUZvtyKpWiQ4H88U8p qEI/bLmSRY5Q/swmsaobKR1TWc0wuo3oFoS2qlogA0Z+qYln3eJh4y3hYout5T6DCojo rHqFuD+GEAh07mBwAQNoGjvehLTC1+Zr9ipdwGbLKaw1nBw8LHissPJav4docgo1OZa3 Pw2B5yEYcxvhf0Rb/fDGsDXQgl34HJeXAVh75q3UYA6qsErc8ZoEFlUNdpV4ttf5z1fJ a/8w== MIME-Version: 1.0 X-Received: by 10.55.48.82 with SMTP id w79mr69311801qkw.95.1440527000168; Tue, 25 Aug 2015 11:23:20 -0700 (PDT) Received: by 10.140.82.106 with HTTP; Tue, 25 Aug 2015 11:23:20 -0700 (PDT) Date: Tue, 25 Aug 2015 11:23:20 -0700 Message-ID: Subject: NameNode HA -Blueprints - Standby NN failed and Active NN created From: Anandha L Ranganathan To: user@ambari.apache.org Content-Type: multipart/alternative; boundary=001a11490c54226f86051e26d4a7 --001a11490c54226f86051e26d4a7 Content-Type: text/plain; charset=UTF-8 Hi I am trying to install Active Namenode HA using blueprints. During the cluster creation through scripts, it does following and completes. 1) Journal nodes starts and initialized (formats journal node). 2) Initialization the HA state in zookeeper or ZKFC ( Both in Active and Standby namenode ) After 96% it fails. I logged into the cluster using UI and re-started the standby namenode. But it throw the exception saying that Namenode not formatted. I have to manually copy the fsimage logs from using this command, "hdfs namenode -bootstrapStandby -force " in the standby NN server. and re-starting the namenode works fine and goes into standby mode. Is it something I am missing in the configuration ? My Namenode HA blue prints looks like this. hadoop-env{ "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%" "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2" } hadoop-ev{ "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%" "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2" } hdfs-site{ "dfs.client.failover.proxy.provider.dfs-nameservices": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "dfs.ha.automatic-failover.enabled": "true", "dfs.ha.fencing.methods": "shell(/bin/true)", "dfs.ha.namenodes.dfs-nameservices": "nn1,nn2", "dfs.namenode.http-address.dfs-nameservices.nn1": "%HOSTGROUP::host_group_master_1%:50070", "dfs.namenode.http-address.dfs-nameservices.nn2": "%HOSTGROUP::host_group_master_2%:50070", "dfs.namenode.https-address.dfs-nameservices.nn1": "%HOSTGROUP::host_group_master_1%:50470", "dfs.namenode.https-address.dfs-nameservices.nn2": "%HOSTGROUP::host_group_master_2%:50470", "dfs.namenode.rpc-address.dfs-nameservices.nn1": "%HOSTGROUP::host_group_master_1%:8020", "dfs.namenode.rpc-address.dfs-nameservices.nn2": "%HOSTGROUP::host_group_master_2%:8020", "dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::host_group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host_group_master_3%:8485/dfs-nameservices", "dfs.nameservices": "dfs-nameservices" } core-site{ "fs.defaultFS": "hdfs://dfs-nameservices", "ha.zookeeper.quorum": "%HOSTGROUP::host_group_master_1%:2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3%:2181" } This is the log message of Standby Namenode server. 2015-08-25 08:26:26,373 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.2.6.0-2800/hadoop 2015-08-25 08:26:26,380 INFO zookeeper.ZooKeeper (ZooKeeper.java:(438)) - Initiating client connection, connectString=usw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.local:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5b7a5baa 2015-08-25 08:26:26,399 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to authenticate using SASL (unknown error) 2015-08-25 08:26:26,405 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to usw2ha2dpma02.local/172.17.213.51:2181, initiating session 2015-08-25 08:26:26,413 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session establishment complete on server usw2ha2dpma02.local/172.17.213.51:2181, sessionid = 0x24f63f6f3050001, negotiated timeout = 5000 2015-08-25 08:26:26,416 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:processWatchEvent(547)) - Session connected. 2015-08-25 08:26:26,441 INFO ipc.CallQueueManager (CallQueueManager.java:(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue 2015-08-25 08:26:26,472 INFO ipc.Server (Server.java:run(605)) - Starting Socket Reader #1 for port 8019 2015-08-25 08:26:26,520 INFO ipc.Server (Server.java:run(827)) - IPC Server Responder: starting 2015-08-25 08:26:26,526 INFO ipc.Server (Server.java:run(674)) - IPC Server listener on 8019: starting 2015-08-25 08:26:27,596 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2015-08-25 08:26:27,615 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020: Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2015-08-25 08:26:27,616 INFO ha.HealthMonitor (HealthMonitor.java:enterState(238)) - Entering state SERVICE_NOT_RESPONDING 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController (ZKFailoverController.java:setLastHealthState(850)) - Local service NameNode at usw2ha2dpma02.local/172.17.213.51:8020 entered state: SERVICE_NOT_RESPONDING 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController (ZKFailoverController.java:recheckElectability(766)) - Quitting master election for NameNode at usw2ha2dpma02.local/172.17.213.51:8020 and marking that fencing is necessary 2015-08-25 08:26:27,617 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:quitElection(354)) - Yielding from election 2015-08-25 08:26:27,621 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down 2015-08-25 08:26:27,621 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x24f63f6f3050001 closed 2015-08-25 08:26:29,623 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2015-08-25 08:26:29,624 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020: Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2015-08-25 08:26:31,626 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2015-08-25 08:26:31,627 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020: Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2015-08-25 08:26:33,629 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2015-08-25 08:26:33,630 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to --001a11490c54226f86051e26d4a7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi

I am trying to install Active Namenode HA us= ing blueprints.
During the cluster creation through scripts, it does=C2= =A0 following and completes.
=C2=A0
1) Journal nodes starts and initi= alized (formats journal node).
2) Initialization the HA state in zookeep= er=C2=A0 or ZKFC ( Both in Active and Standby namenode )
After 96% it fa= ils.=C2=A0=C2=A0=C2=A0 I logged into the cluster using UI and re-started th= e standby namenode. But it throw the exception saying that Namenode not for= matted.
I have to manually copy the fsimage logs from using this command= , "hdfs namenode -bootstrapStandby -force " in the standby NN ser= ver.
and re-starting the namenode=C2=A0 works fine and=C2=A0 goes into s= tandby mode.

Is it something I am missing in the configuration ?
= My Namenode HA blue prints looks like this.

hadoop-env{
=C2=A0&qu= ot;dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_maste= r_1%" "dfs_ha_initial_namenode_standby": "%HOSTGROUP::h= ost_group_master_2"
}


hadoop-ev{

=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs_ha_initial_namenode_active":=C2= =A0 "%HOSTGROUP::host_group_master_1%"
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 "dfs_ha_initial_namenode_standby": "%H= OSTGROUP::host_group_master_2"
}

hdfs-site{
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.client.failover.proxy.= provider.dfs-nameservices": "org.apache.hadoop.hdfs.server.nameno= de.ha.ConfiguredFailoverProxyProvider",
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.ha.automatic-failover.enabled": = "true",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= "dfs.ha.fencing.methods": "shell(/bin/true)",
=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.ha.namenodes.= dfs-nameservices": "nn1,nn2",
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.namenode.http-address.dfs-nameservice= s.nn1": "%HOSTGROUP::host_group_master_1%:50070",
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.namenode.http-ad= dress.dfs-nameservices.nn2": "%HOSTGROUP::host_group_master_2%:50= 070",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "= dfs.namenode.https-address.dfs-nameservices.nn1": "%HOSTGROUP::ho= st_group_master_1%:50470",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 "dfs.namenode.https-address.dfs-nameservices.nn2"= : "%HOSTGROUP::host_group_master_2%:50470",
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.namenode.rpc-address.dfs-nam= eservices.nn1": "%HOSTGROUP::host_group_master_1%:8020",
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.namenode.r= pc-address.dfs-nameservices.nn2": "%HOSTGROUP::host_group_master_= 2%:8020",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &q= uot;dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::host_= group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host= _group_master_3%:8485/dfs-nameservices",
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "dfs.nameservices": "dfs-name= services"

}


core-site{
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "fs.defaultFS": "hdfs://dfs-n= ameservices",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 "ha.zookeeper.quorum": "%HOSTGROUP::host_group_master_1%= :2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3= %:2181"

}



This is the log message of Standby Nam= enode server.

2015-08-25 08:26:26,373 INFO=C2=A0 zookeeper.ZooKeeper= (Environment.java:logEnv(100)) - Client environment:user.dir=3D/usr/hdp/2.= 2.6.0-2800/hadoop
2015-08-25 08:26:26,380 INFO=C2=A0 zookeeper.ZooKeeper= (ZooKeeper.java:<init>(438)) - Initiating client connection, connect= String=3Dusw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.lo= cal:2181 sessionTimeout=3D5000 watcher=3Dorg.apache.hadoop.ha.ActiveStandby= Elector$WatcherWithClientRef@5b7a5baa
2015-08-25 08:26:26,399 INFO=C2=A0= zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening sock= et connection to server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to authenticate using SASL = (unknown error)
2015-08-25 08:26:26,405 INFO=C2=A0 zookeeper.ClientCnxn = (ClientCnxn.java:primeConnection(852)) - Socket connection established to u= sw2ha2dpma02.local/172.17.213.51:2181= , initiating session
2015-08-25 08:26:26,413 INFO=C2=A0 zookeeper.Cl= ientCnxn (ClientCnxn.java:onConnected(1235)) - Session establishment comple= te on server usw2ha2dpma02.local/172.= 17.213.51:2181, sessionid =3D 0x24f63f6f3050001, negotiated timeout =3D= 5000
2015-08-25 08:26:26,416 INFO=C2=A0 ha.ActiveStandbyElector (Active= StandbyElector.java:processWatchEvent(547)) - Session connected.
2015-08= -25 08:26:26,441 INFO=C2=A0 ipc.CallQueueManager (CallQueueManager.java:<= ;init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQ= ueue
2015-08-25 08:26:26,472 INFO=C2=A0 ipc.Server (Server.java:run(605)= ) - Starting Socket Reader #1 for port 8019
2015-08-25 08:26:26,520 INFO= =C2=A0 ipc.Server (Server.java:run(827)) - IPC Server Responder: starting2015-08-25 08:26:26,526 INFO=C2=A0 ipc.Server (Server.java:run(674)) - IP= C Server listener on 8019: starting
2015-08-25 08:26:27,596 INFO=C2=A0 i= pc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to = server: usw2ha2dpma02.local/172.17.21= 3.51:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCou= ntWithFixedSleep(maxRetries=3D1, sleepTime=3D1000 MILLISECONDS)
2015-08-= 25 08:26:27,615 WARN=C2=A0 ha.HealthMonitor (HealthMonitor.java:doHealthChe= cks(209)) - Transport-level exception trying to monitor health of NameNode = at usw2ha2dpma02.local/172.17.213.51:= 8020: Call From usw2ha2dpma02.local/17= 2.17.213.51 to usw2ha2dpma02.local:8020 failed on connection exception:= java.net.ConnectException: Connection refused; For more details see:=C2=A0= http://wiki.ap= ache.org/hadoop/ConnectionRefused
2015-08-25 08:26:27,616 INFO=C2=A0= ha.HealthMonitor (HealthMonitor.java:enterState(238)) - Entering state SER= VICE_NOT_RESPONDING
2015-08-25 08:26:27,616 INFO=C2=A0 ha.ZKFailoverCont= roller (ZKFailoverController.java:setLastHealthState(850)) - Local service = NameNode at usw2ha2dpma02.local/172.1= 7.213.51:8020 entered state: SERVICE_NOT_RESPONDING
2015-08-25 08:26= :27,616 INFO=C2=A0 ha.ZKFailoverController (ZKFailoverController.java:reche= ckElectability(766)) - Quitting master election for NameNode at usw2ha2dpma= 02.local/172.17.213.51:8020 and m= arking that fencing is necessary
2015-08-25 08:26:27,617 INFO=C2=A0 ha.A= ctiveStandbyElector (ActiveStandbyElector.java:quitElection(354)) - Yieldin= g from election
2015-08-25 08:26:27,621 INFO=C2=A0 zookeeper.ClientCnxn = (ClientCnxn.java:run(512)) - EventThread shut down
2015-08-25 08:26:27,6= 21 INFO=C2=A0 zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x= 24f63f6f3050001 closed
2015-08-25 08:26:29,623 INFO=C2=A0 ipc.Client (Cl= ient.java:handleConnectionFailure(859)) - Retrying connect to server: usw2h= a2dpma02.local/172.17.213.51:8020= . Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSl= eep(maxRetries=3D1, sleepTime=3D1000 MILLISECONDS)
2015-08-25 08:26:29,6= 24 WARN=C2=A0 ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - T= ransport-level exception trying to monitor health of NameNode at usw2ha2dpm= a02.local/172.17.213.51:8020: Cal= l From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 failed on connection exception: java.net.Con= nectException: Connection refused; For more details see:=C2=A0 http://wiki.apache.org/hado= op/ConnectionRefused
2015-08-25 08:26:31,626 INFO=C2=A0 ipc.Client (= Client.java:handleConnectionFailure(859)) - Retrying connect to server: usw= 2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixed= Sleep(maxRetries=3D1, sleepTime=3D1000 MILLISECONDS)
2015-08-25 08:26:31= ,627 WARN=C2=A0 ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) -= Transport-level exception trying to monitor health of NameNode at usw2ha2d= pma02.local/
172.17.213.51:8020: C= all From usw2ha2dpma02.local/172.17.213.51= to usw2ha2dpma02.local:8020 failed on connection exception: java.net.C= onnectException: Connection refused; For more details see:=C2=A0 http://wiki.apache.org/ha= doop/ConnectionRefused
2015-08-25 08:26:33,629 INFO=C2=A0 ipc.Client= (Client.java:handleConnectionFailure(859)) - Retrying connect to server: u= sw2ha2dpma02.local/172.17.213.51:8020= . Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFix= edSleep(maxRetries=3D1, sleepTime=3D1000 MILLISECONDS)
2015-08-25 08:26:= 33,630 WARN=C2=A0 ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209))= - Transport-level exception trying to

--001a11490c54226f86051e26d4a7--