Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 96BD611A73 for ; Fri, 1 Aug 2014 04:35:39 +0000 (UTC) Received: (qmail 20971 invoked by uid 500); 1 Aug 2014 04:35:33 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 20842 invoked by uid 500); 1 Aug 2014 04:35:33 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 20828 invoked by uid 99); 1 Aug 2014 04:35:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 04:35:33 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 209.85.216.47 is neither permitted nor denied by domain of discord@uw.edu) Received: from [209.85.216.47] (HELO mail-qa0-f47.google.com) (209.85.216.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 04:35:26 +0000 Received: by mail-qa0-f47.google.com with SMTP id i13so3405831qae.34 for ; Thu, 31 Jul 2014 21:35:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=q/5c0DaSTIqqMM9b8HH6/yoSSyFqEykYLRmBjlRbT+A=; b=KF2PYxXZfyu6W71gi1mV/YI7SMu4tJm1FeDHtFBpfxbnzkKkrbvR6H631tB8qq7Cxl tIaybllp3J1sW16rEBrnU/P/+QVFXCq1YsMz5CHYLia+aXnsYIEDoNCcRhIpwIFpgUYt SuoJ9wfuHcemW2I0sLRNWnNj42+P2OymRKr8Pf8Ejxjw4yDmdYqpt0R9D5VoSJmo18Nd HCa8c22Jk5NodziKWT4r6Pq/AfzMFgbrHF6mKAIG0Wv3ahOJH6wBde6BWS0ADc5CHfPH B7NRY0BZNXP/8mVSKWFxmnWLFXWOMUHk9vCPHDdvzMZweGG/cZturyyPYutOBgtfjpMi 4heQ== X-Gm-Message-State: ALoCoQkQ1G5CrwEcWkPkuFqBfz3GStZ+YaUdWdmr2Pq36afvF/9FhrpsUT+XjVTiTT7/voIdG8Mt MIME-Version: 1.0 X-Received: by 10.224.166.195 with SMTP id n3mr4507089qay.22.1406867700750; Thu, 31 Jul 2014 21:35:00 -0700 (PDT) Received: by 10.140.82.38 with HTTP; Thu, 31 Jul 2014 21:35:00 -0700 (PDT) In-Reply-To: References: Date: Thu, 31 Jul 2014 21:35:00 -0700 Message-ID: Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration From: Colin Kincaid Williams To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e015381108cbc7c04ff89e99a X-Virus-Checked: Checked by ClamAV on apache.org --089e015381108cbc7c04ff89e99a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable However continuing with the process my QJM eventually error'd out and my Active NameNode went down. 2014-07-31 20:59:33,944 WARN [Logger channel to rhel6.local/ 10.120.5.247:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed to write txns 9635-9635. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.ja= va:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Journal= NodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTransl= atorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalPro= tocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1224) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.ja= va:202) at com.sun.proxy.$Proxy9.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.jou= rnal(QJournalProtocolTranslatorPB.java:156) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerCha= nnel.java:354) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerCha= nnel.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 615) at java.lang.Thread.run(Thread.java:745) 2014-07-31 20:59:33,954 WARN [Logger channel to rhel1.local/ 10.120.5.203:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed to write txns 9635-9635. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.ja= va:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Journal= NodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTransl= atorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalPro= tocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1224) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.ja= va:202) at com.sun.proxy.$Proxy9.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.jou= rnal(QJournalProtocolTranslatorPB.java:156) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerCha= nnel.java:354) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerCha= nnel.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 615) at java.lang.Thread.run(Thread.java:745) 2014-07-31 20:59:33,975 WARN [Logger channel to rhel2.local/ 10.120.5.25:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed to write txns 9635-9635. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.ja= va:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Journal= NodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTransl= atorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalPro= tocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1224) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.ja= va:202) at com.sun.proxy.$Proxy9.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.jou= rnal(QJournalProtocolTranslatorPB.java:156) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerCha= nnel.java:354) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerCha= nnel.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 615) at java.lang.Thread.run(Thread.java:745) 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) - Error: flush failed for required journal (JournalAndStream(mgr=3DQJM to [ 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485], stream=3DQuorumOutputStream starting at txid 9634)) org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown: 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.ja= va:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Journal= NodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTransl= atorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalPro= tocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.ja= va:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Journal= NodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTransl= atorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalPro= tocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.ja= va:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Journal= NodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTransl= atorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalPro= tocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumExcepti= on.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCa= ll.java:213) at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(As= yncLoggerSet.java:142) at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(Quor= umOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOut= putStream.java:113) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOut= putStream.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.= apply(JournalSet.java:490) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportError= s(JournalSet.java:350) at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.jav= a:55) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.fl= ush(JournalSet.java:486) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581= ) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEdi= tLog.java:946) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java= :884) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:101= 3) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesyste= m.java:4436) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNo= deRpcServer.java:734) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.ro= llEditLog(NamenodeProtocolServerSideTranslatorPB.java:129) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtoc= olService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= obufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 2014-07-31 20:59:33,976 WARN [IPC Server handler 5 on 8020] client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 9634 2014-07-31 20:59:33,978 INFO [IPC Server handler 5 on 8020] util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1 2014-07-31 20:59:33,982 INFO [Thread-0] namenode.NameNode (StringUtils.java:run(615)) - SHUTDOWN_MSG: On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams wrote: > I tried a third time and it just worked? > > sudo hdfs zkfc -formatZK > 2014-07-31 18:07:51,595 INFO [main] tools.DFSZKFailoverController > (DFSZKFailoverController.java:(140)) - Failover controller configur= ed > for NameNode NameNode at rhel1.local/10.120.5.203:8020 > 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:zookeeper.version=3D3.4.3-cdh4.1.3--1, built on 01/27/2013 00= :13 > GMT > 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:host.name=3Drhel1.loc= al > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.version=3D1.7.0_= 60 > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.vendor=3DOracle > Corporation > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:java.home=3D/usr/java/jdk1.7.0_60/jre > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:java.class.path=3D/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey= -core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/j= axb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/= lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/had= oop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/li= b/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/h= adoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/= usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-= 3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/li= b/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/li= b/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/= lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-= 1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/= jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoo= p/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/ha= doop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.= jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-run= time-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoo= p/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.= jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/acti= vation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/li= b/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2= .5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/juni= t-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/comm= ons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/= lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop= /lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/li= b/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.ja= r:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annot= ations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/had= oop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-com= mon-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/h= adoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:= /usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/l= ib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-map= per-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs= /lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/u= sr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.2= 6.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-h= dfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.ja= r:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdf= s/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/li= b/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.= 94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib= /commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/= lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jett= y-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:= /usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/= commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.ja= r:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/had= oop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/u= sr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/li= b/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoo= p-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice= -1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib= /avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/= usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2= .1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/li= b/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/= lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paraname= r-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop= -yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn= /.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-= site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cd= h4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn= /.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodeman= ager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hado= op-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn= /.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-appli= cations-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-= yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yar= n-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn= -server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-= proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-= cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1= .3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.= 3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-m= apreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib= /hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapre= duce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1= .6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/= hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20= -mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.j= ar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapr= educe/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-loggi= ng-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/h= adoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib= /jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.= 20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servle= t-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/= hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-ma= preduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper= -compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-= 0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/co= mmons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools= -1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/= hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons= -digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/= hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-m= apreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.= 1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/= usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hado= op-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/l= ib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/sn= appy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-ma= preduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito= -all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib= /hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduc= e/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.= jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/ha= doop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/= lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairsch= eduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-exam= ples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-= mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1= -cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/had= oop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.= 20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-to= ols.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.ja= r:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:= /usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/l= ib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/ha= doop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.= //hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//had= oop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-co= re.jar > 2014-07-31 18:07:51,793 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:java.library.path=3D//usr/lib/hadoop/lib/native > 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=3D/tmp > 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.compiler=3D > 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:os.name=3DLinux > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:os.arch=3Damd64 > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:os.version=3D2.6.32-358.el6.x86_64 > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:user.name=3Droot > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:user.home=3D/root > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:user.dir=3D/etc/hbase/conf.golden_apple > 2014-07-31 18:07:51,813 INFO [main] zookeeper.ZooKeeper > (ZooKeeper.java:(433)) - Initiating client connection, > connectString=3Drhel1.local:2181,rhel6.local:2181,rhel2.local:2181 > sessionTimeout=3D5000 watcher=3Dnull > 2014-07-31 18:07:51,833 INFO [main-SendThread(rhel1.local:2181)] > zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening > socket connection to server rhel1.local/10.120.5.203:2181. Will not > attempt to authenticate using SASL (unknown error) > 2014-07-31 18:07:51,844 INFO [main-SendThread(rhel1.local:2181)] > zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket > connection established to rhel1.local/10.120.5.203:2181, initiating > session > 2014-07-31 18:07:51,852 INFO [main-SendThread(rhel1.local:2181)] > zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session > establishment complete on server rhel1.local/10.120.5.203:2181, sessionid > =3D 0x1478902fddc000a, negotiated timeout =3D 5000 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > The configured parent znode /hadoop-ha/golden-apple already exists. > Are you sure you want to clear all failover information from > ZooKeeper? > WARNING: Before proceeding, ensure that all HDFS services and > failover controllers are stopped! > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31 > 18:07:51,858 INFO [main-EventThread] ha.ActiveStandbyElector > (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected. > Y > 2014-07-31 18:08:00,439 INFO [main] ha.ActiveStandbyElector > (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting > /hadoop-ha/golden-apple from ZK... > 2014-07-31 18:08:00,506 INFO [main] ha.ActiveStandbyElector > (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted > /hadoop-ha/golden-apple from ZK. > 2014-07-31 18:08:00,541 INFO [main] ha.ActiveStandbyElector > (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created > /hadoop-ha/golden-apple in ZK. > 2014-07-31 18:08:00,545 INFO [main-EventThread] zookeeper.ClientCnxn > (ClientCnxn.java:run(511)) - EventThread shut down > 2014-07-31 18:08:00,545 INFO [main] zookeeper.ZooKeeper > (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed > > > > On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman wrote: > >> Cheers. That's rough. We don't have that problem here at WanDISCO. >> >> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams >> wrote: >> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier >> today. >> > Just thought I'd forward this info regarding swapping out the NameNode >> in a >> > QJM / HA configuration. See you around on #hbase. If you visit Seattle= , >> feel >> > free to give me a shout out. >> > >> > ---------- Forwarded message ---------- >> > From: Colin Kincaid Williams >> > Date: Thu, Jul 31, 2014 at 12:35 PM >> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / H= A >> > configuration >> > To: user@hadoop.apache.org >> > >> > >> > Hi Jing, >> > >> > Thanks for the response. I will try this out, and file an Apache jira. >> > >> > Best, >> > >> > Colin Williams >> > >> > >> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao >> wrote: >> >> >> >> Hi Colin, >> >> >> >> I guess currently we may have to restart almost all the >> >> daemons/services in order to swap out a standby NameNode (SBN): >> >> >> >> 1. The current active NameNode (ANN) needs to know the new SBN since = in >> >> the current implementation the SBN tries to send rollEditLog RPC >> request to >> >> ANN periodically (thus if a NN failover happens later, the original A= NN >> >> needs to send this RPC to the correct NN). >> >> 2. Looks like the DataNode currently cannot do real refreshment for N= N. >> >> Look at the code in BPOfferService: >> >> >> >> void refreshNNList(ArrayList addrs) throws >> >> IOException { >> >> Set oldAddrs =3D Sets.newHashSet(); >> >> for (BPServiceActor actor : bpServices) { >> >> oldAddrs.add(actor.getNNSocketAddress()); >> >> } >> >> Set newAddrs =3D Sets.newHashSet(addrs); >> >> >> >> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) { >> >> // Keep things simple for now -- we can implement this at a lat= er >> >> date. >> >> throw new IOException( >> >> "HA does not currently support adding a new standby to a >> running >> >> DN. " + >> >> "Please do a rolling restart of DNs to reconfigure the list >> of >> >> NNs."); >> >> } >> >> } >> >> >> >> 3. If you're using automatic failover, you also need to update the >> >> configuration of the ZKFC on the current ANN machine, since ZKFC will >> do >> >> gracefully fencing by sending RPC to the other NN. >> >> 4. Looks like we do not need to restart JournalNodes for the new SBN >> but I >> >> have not tried before. >> >> >> >> Thus in general we may still have to restart all the services >> (except >> >> JNs) and update their configurations. But this may be a rolling resta= rt >> >> process I guess: >> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN= . >> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling >> restart >> >> of all the DN to update their configurations >> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update the= ir >> >> configuration. The new SBN should become active. >> >> >> >> I have not tried the upper steps, thus please let me know if this >> >> works or not. And I think we should also document the correct steps i= n >> >> Apache. Could you please file an Apache jira? >> >> >> >> Thanks, >> >> -Jing >> >> >> >> >> >> >> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams < >> discord@uw.edu> >> >> wrote: >> >>> >> >>> Hello, >> >>> >> >>> I'm trying to swap out a standby NameNode in a QJM / HA >> configuration. I >> >>> believe the steps to achieve this would be something similar to: >> >>> >> >>> Use the Bootstrap standby command to prep the replacment standby. Or >> >>> rsync if the command fails. >> >>> >> >>> Somehow update the datanodes, so they push the heartbeat / journal t= o >> the >> >>> new standby >> >>> >> >>> Update the xml configuration on all nodes to reflect the replacment >> >>> standby. >> >>> >> >>> Start the replacment standby >> >>> >> >>> Use some hadoop command to refresh the datanodes to the new NameNode >> >>> configuration. >> >>> >> >>> I am not sure how to deal with the Journal switch, or if I am going >> about >> >>> this the right way. Can anybody give me some suggestions here? >> >>> >> >>> >> >>> Regards, >> >>> >> >>> Colin Williams >> >>> >> >> >> >> >> >> CONFIDENTIALITY NOTICE >> >> NOTICE: This message is intended for the use of the individual or >> entity >> >> to which it is addressed and may contain information that is >> confidential, >> >> privileged and exempt from disclosure under applicable law. If the >> reader of >> >> this message is not the intended recipient, you are hereby notified >> that any >> >> printing, copying, dissemination, distribution, disclosure or >> forwarding of >> >> this communication is strictly prohibited. If you have received this >> >> communication in error, please contact the sender immediately and >> delete it >> >> from your system. Thank You. >> > >> > >> > >> > > --089e015381108cbc7c04ff89e99a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
However continuing with the process my QJM eventually erro= r'd out and my Active NameNode went down.

2014-= 07-31 20:59:33,944 WARN =C2=A0[Logger channel to rhel6.local/10.120.5.247:8485] client.QuorumJournalManager (= IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed to write txns 9635-9635. Will try = to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server= .Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoo= p.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjourn= al.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoo= p.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(Q= JournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protoc= ol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJou= rnalProtocolProtos.java:14018)
at org.apache.hadoo= p.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.ja= va:453)
at org= .apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.ru= n(Server.java:1689)
at java.security.Ac= cessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.ja= va:415)
at org.apache.hadoo= p.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
<= div> at org.apache.hadoop= .ipc.Server$Handler.run(Server.java:1687)

at o= rg.apache.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEn= gine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$P= roxy9.journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProto= colTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoo= p.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.h= adoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:34= 7)
at java.util.concur= rent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.run= Worker(ThreadPoolExecutor.java:1145)
at java.util.concur= rent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
<= span class=3D"" style=3D"white-space:pre"> at java.lang.Thread.run(T= hread.java:745)
2014-07-31 20:59:33,954 WARN =C2=A0[Logger channel to rhel1.local/10.120.5.203:8485] client.QuorumJourna= lManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed to write txns 9635-9635.= Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server= .Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoo= p.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjourn= al.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoo= p.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(Q= JournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protoc= ol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJou= rnalProtocolProtos.java:14018)
at org.apache.hadoo= p.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.ja= va:453)
at org= .apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.ru= n(Server.java:1689)
at java.security.Ac= cessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.ja= va:415)
at org.apache.hadoo= p.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
<= div> at org.apache.hadoop= .ipc.Server$Handler.run(Server.java:1687)

at o= rg.apache.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEn= gine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$P= roxy9.journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProto= colTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoo= p.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.h= adoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:34= 7)
at java.util.concur= rent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.run= Worker(ThreadPoolExecutor.java:1145)
at java.util.concur= rent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
<= span class=3D"" style=3D"white-space:pre"> at java.lang.Thread.run(T= hread.java:745)
2014-07-31 20:59:33,975 WARN =C2=A0[Logger channel to rhel2.local/10.120.5.25:8485] client.QuorumJournalM= anager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed to write txns 9635-9635. Wil= l try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server= .Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoo= p.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjourn= al.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoo= p.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(Q= JournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protoc= ol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJou= rnalProtocolProtos.java:14018)
at org.apache.hadoo= p.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.ja= va:453)
at org= .apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.ru= n(Server.java:1689)
at java.security.Ac= cessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.ja= va:415)
at org.apache.hadoo= p.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
<= div> at org.apache.hadoop= .ipc.Server$Handler.run(Server.java:1687)

at o= rg.apache.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEn= gine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$P= roxy9.journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProto= colTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoo= p.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.h= adoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:34= 7)
at java.util.concur= rent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.run= Worker(ThreadPoolExecutor.java:1145)
at java.util.concur= rent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
<= span class=3D"" style=3D"white-space:pre"> at java.lang.Thread.run(T= hread.java:745)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] namenode.= FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) - Error: flush = failed for required journal (JournalAndStream(mgr=3DQJM to [10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.= 25:8485], stream=3DQuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many e= xceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC's epoch 5 is no= t the current writer epoch =C2=A00
at org.apache.hadoo= p.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.h= dfs.qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoo= p.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.ja= va:142)
at org= .apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslato= rPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoo= p.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.c= allBlockingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRp= cEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoo= p.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.= java:1693)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileg= ed(Native Method)
at javax.security.a= uth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doA= s(UserGroupInformation.java:1332)
at org.apache.hadoo= p.ipc.Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's epoch 5 is n= ot the current writer epoch =C2=A00
at org.apache.hadoo= p.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.h= dfs.qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoo= p.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.ja= va:142)
at org= .apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslato= rPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoo= p.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.c= allBlockingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRp= cEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoo= p.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.= java:1693)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileg= ed(Native Method)
at javax.security.a= uth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doA= s(UserGroupInformation.java:1332)
at org.apache.hadoo= p.ipc.Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's epoch 5 is n= ot the current writer epoch =C2=A00
at org.apache.hadoo= p.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.h= dfs.qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoo= p.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.ja= va:142)
at org= .apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslato= rPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoo= p.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.c= allBlockingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRp= cEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoo= p.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.= java:1693)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileg= ed(Native Method)
at javax.security.a= uth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doA= s(UserGroupInformation.java:1332)
at org.apache.hadoo= p.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.hdfs.qjour= nal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoo= p.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at org.apache.had= oop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.j= ava:142)
at org.apache.hadoo= p.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.j= ava:107)
at or= g.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutpu= tStream.java:113)
at org.apache.hadoo= p.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:1= 07)
at org.apa= che.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(J= ournalSet.java:490)
at org.apache.hadoo= p.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.jav= a:350)
at org.= apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55= )
at org.apache.hadoo= p.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.j= ava:486)
at or= g.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)<= /div>
at org.apache.hadoo= p.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at org.apache.h= adoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at org.apache.hadoo= p.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at org.apache.hadoop.hdfs.s= erver.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at org.apache.hadoo= p.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java= :734)
at org.a= pache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEdi= tLog(NamenodeProtocolServerSideTranslatorPB.java:129)
at org.apache.hadoo= p.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.call= BlockingMethod(NamenodeProtocolProtos.java:8762)
at org.apache.hadoop.ipc.ProtobufRpcEngi= ne$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoo= p.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.= java:1693)
at org.apache.hadoo= p.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileg= ed(Native Method)
at javax.security.a= uth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doA= s(UserGroupInformation.java:1332)
at org.apache.hadoo= p.ipc.Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,97= 6 WARN =C2=A0[IPC Server handler 5 on 8020] client.QuorumJournalManager (Qu= orumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at = txid 9634
2014-07-31 20:59:33,978 INFO =C2=A0[IPC Server handler 5 on 8020] util= .ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
2= 014-07-31 20:59:33,982 INFO =C2=A0[Thread-0] namenode.NameNode (StringUtils= .java:run(615)) - SHUTDOWN_MSG:=C2=A0



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discor= d@uw.edu> wrote:
I tried a third time and it= just worked?

sudo hdfs zkfc -formatZK
20= 14-07-31 18:07:51,595 INFO =C2=A0[main] tools.DFSZKFailoverController (DFSZ= KFailoverController.java:<init>(140)) - Failover controller configure= d for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:zookeeper.version=3D3.4.3-cdh4.= 1.3--1, built on 01/27/2013 00:13 GMT
2014-07-31 18:07:51,791 INF= O =C2=A0[main] zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client = environment:host.name=3D= rhel1.local
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.version=3D1.7.0_60
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environme= nt.java:logEnv(100)) - Client environment:java.vendor=3DOracle Corporation<= /div>
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.home=3D/usr/java/jdk1.7.0_= 60/jre
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKee= per (Environment.java:logEnv(100)) - Client environment:java.class.path=3D/= etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib= /commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/had= oop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/u= sr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hado= op/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/= lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/li= b/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.ja= r:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jac= kson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoo= p/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:= /usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.clouder= a.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jaspe= r-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/= hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib= /hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.= 6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-x= c-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/l= ib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/li= b/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-h= ttpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/li= b/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.= 7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy= -java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/s= tax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/= lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/had= oop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/h= adoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:= /usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-cod= ec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/li= b/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1= .3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/h= adoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1= .3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/= hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.= 2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop= -hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/ha= doop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.5= 2.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-= hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.ja= r:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookee= per-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/li= b/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1= .8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/c= ommons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hado= op-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtim= e-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr= /lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf= -java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/ha= doop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-= hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/= lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-ma= pper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yar= n/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8= .jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib= /log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/l= ib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0= .jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/a= opalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/had= oop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet= -3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/li= b/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications= -distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.= 3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/= hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.= //hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib= /hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//= hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tes= ts-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar= :/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cd= h4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.= 1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tes= ts.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/= lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/./= /hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn= /.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//ha= doop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//had= oop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-= mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-co= ntrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-map= reduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson= -mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce= /lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20= -mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0= .52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jac= kson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.2= 0-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.= 7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/had= oop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-m= apreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson= -xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:= /usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop= -0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons= -httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapred= uce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib= /commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lan= g-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/ha= doop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:= /usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.= 20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-a= pi-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduc= e/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-js= on-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/us= r/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/= lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/had= oop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.2= 0-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.= 0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/= usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapr= educe/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/= .//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/./= /hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoo= p-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-exa= mples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools= .jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/= usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.library.path=3D//usr/lib/h= adoop/lib/native
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookee= per.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.t= mpdir=3D/tmp
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.compiler=3D<NA>
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookeeper.ZooKeeper (Enviro= nment.java:logEnv(100)) - Client environment:os.name=3DLinux
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:os.arch=3Damd64
2014-= 07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environment.java:= logEnv(100)) - Client environment:os.version=3D2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:user.name=3Droot
2014-07-31 18:07:51,802 INFO= =C2=A0[main] zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client e= nvironment:user.home=3D/root
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:user.dir=3D/etc/hbase/conf.gold= en_apple
2014-07-31 18:07:51,813 INFO =C2=A0[main] zookeeper.ZooK= eeper (ZooKeeper.java:<init>(433)) - Initiating client connection, co= nnectString=3Drhel1.local:2181,rhel6.local:2181,rhel2.local:2181 sessionTim= eout=3D5000 watcher=3Dnull
2014-07-31 18:07:51,833 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening sock= et connection to server rhel1.local/10.120.5.203:2181. Will not attempt to authenticate usi= ng SASL (unknown error)
2014-07-31 18:07:51,844 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket conne= ction established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session establi= shment complete on server rhel1.local/10.120.5.203:2181, sessionid =3D 0x1478902fddc000a, n= egotiated timeout =3D 5000
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
The configured parent znode /hadoop-ha/golden-apple already exists.=
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Proceed formatting = /hadoop-ha/golden-apple? (Y or N) 2014-07-31 18:07:51,858 INFO =C2=A0[main-= EventThread] ha.ActiveStandbyElector (ActiveStandbyElector.java:processWatc= hEvent(538)) - Session connected.
Y=C2=A0
2014-07-31 18:08:00,439 INFO =C2=A0[main] ha.ActiveS= tandbyElector (ActiveStandbyElector.java:clearParentZNode(314)) - Recursive= ly deleting /hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:0= 0,506 INFO =C2=A0[main] ha.ActiveStandbyElector (ActiveStandbyElector.java:= clearParentZNode(327)) - Successfully deleted /hadoop-ha/golden-apple from = ZK.
2014-07-31 18:08:00,541 INFO =C2=A0[main] ha.ActiveStandbyElector (Act= iveStandbyElector.java:ensureParentZNode(299)) - Successfully created /hado= op-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO =C2=A0[mai= n-EventThread] zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThrea= d shut down
2014-07-31 18:08:00,545 INFO =C2=A0[main] zookeeper.ZooKeeper (ZooKeep= er.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix4e@gmail.com> wrote:
Cheers. That's rough. We don't have that problem h= ere at WanDISCO.

On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <discord@uw.edu> wrote:
> Hi this is drocsid / discord from #hbase. Thanks for the help earlier = today.
> Just thought I'd forward this info regarding swapping out the Name= Node in a
> QJM / HA configuration. See you around on #hbase. If you visit Seattle= , feel
> free to give me a shout out.
>
> ---------- Forwarded message ----------
> From: Colin Kincaid Williams <discord@uw.edu>
> Date: Thu, Jul 31, 2014 at 12:35 PM
> Subject: Re: Juggling or swaping out the standby NameNode in a QJM / H= A
> configuration
> To: user@h= adoop.apache.org
>
>
> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.=
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.com> wrote:
>>
>> Hi Colin,
>>
>> =C2=A0 =C2=A0 I guess currently we may have to restart almost all = the
>> daemons/services in order to swap out a standby NameNode (SBN): >>
>> 1. The current active NameNode (ANN) needs to know the new SBN sin= ce in
>> the current implementation the SBN tries to send rollEditLog RPC r= equest to
>> ANN periodically (thus if a NN failover happens later, the origina= l ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment fo= r NN.
>> Look at the code in BPOfferService:
>>
>> =C2=A0 void refreshNNList(ArrayList<InetSocketAddress> addrs= ) throws
>> IOException {
>> =C2=A0 =C2=A0 Set<InetSocketAddress> oldAddrs =3D Sets.newHa= shSet();
>> =C2=A0 =C2=A0 for (BPServiceActor actor : bpServices) {
>> =C2=A0 =C2=A0 =C2=A0 oldAddrs.add(actor.getNNSocketAddress());
>> =C2=A0 =C2=A0 }
>> =C2=A0 =C2=A0 Set<InetSocketAddress> newAddrs =3D Sets.newHa= shSet(addrs);
>>
>> =C2=A0 =C2=A0 if (!Sets.symmetricDifference(oldAddrs, newAddrs).is= Empty()) {
>> =C2=A0 =C2=A0 =C2=A0 // Keep things simple for now -- we can imple= ment this at a later
>> date.
>> =C2=A0 =C2=A0 =C2=A0 throw new IOException(
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "HA does not currently sup= port adding a new standby to a running
>> DN. " +
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Please do a rolling resta= rt of DNs to reconfigure the list of
>> NNs.");
>> =C2=A0 =C2=A0 }
>> =C2=A0 }
>>
>> 3. If you're using automatic failover, you also need to update= the
>> configuration of the ZKFC on the current ANN machine, since ZKFC w= ill do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new S= BN but I
>> have not tried before.
>>
>> =C2=A0 =C2=A0 Thus in general we may still have to restart all the= services (except
>> JNs) and update their configurations. But this may be a rolling re= start
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new = SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling r= estart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update = their
>> configuration. The new SBN should become active.
>>
>> =C2=A0 =C2=A0 I have not tried the upper steps, thus please let me= know if this
>> works or not. And I think we should also document the correct step= s in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu>
>> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA co= nfiguration. I
>>> believe the steps to achieve this would be something similar t= o:
>>>
>>> Use the Bootstrap standby command to prep the replacment stand= by. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / jou= rnal to the
>>> new standby
>>>
>>> Update the xml configuration on all nodes to reflect the repla= cment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new Na= meNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am = going about
>>> this the right way. Can anybody give me some suggestions here?=
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or = entity
>> to which it is addressed and may contain information that is confi= dential,
>> privileged and exempt from disclosure under applicable law. If the= reader of
>> this message is not the intended recipient, you are hereby notifie= d that any
>> printing, copying, dissemination, distribution, disclosure or forw= arding of
>> this communication is strictly prohibited. If you have received th= is
>> communication in error, please contact the sender immediately and = delete it
>> from your system. Thank You.
>
>
>


--089e015381108cbc7c04ff89e99a--