Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B573119A5 for ; Fri, 1 Aug 2014 13:47:06 +0000 (UTC) Received: (qmail 96592 invoked by uid 500); 1 Aug 2014 13:47:00 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 96478 invoked by uid 500); 1 Aug 2014 13:47:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 96468 invoked by uid 99); 1 Aug 2014 13:47:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 13:47:00 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bbeaudreault@hubspot.com designates 74.125.149.67 as permitted sender) Received: from [74.125.149.67] (HELO na3sys009aog101.obsmtp.com) (74.125.149.67) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 13:46:55 +0000 Received: from mail-vc0-f177.google.com ([209.85.220.177]) (using TLSv1) by na3sys009aob101.postini.com ([74.125.148.12]) with SMTP ID DSNKU9uaOrAEY9iNiQl+5bqJO/h3RwgX5Yv/@postini.com; Fri, 01 Aug 2014 06:46:34 PDT Received: by mail-vc0-f177.google.com with SMTP id hy4so6520835vcb.22 for ; Fri, 01 Aug 2014 06:46:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=iY8v/jnzY6r9YcDjQ7NnbM678E/Lx2QyJBDBpZP78VA=; b=eR9ElyTxOnCk/V63t2/iSKu0VLTPkhjTHOIiVY2DtA2kTGcjVNf4qL8TTz6IGJQ1N0 Hv85aPr9g5gxFx5n7jhPy2Nase+XhDuNhhwJTCmZlnNpxACoAo+GJjeM5jQyJHUIu4lI 0qAh6mnxQoZ6Iuyor98A3rhBrlzuCjyJqolWQ5M64x0Jy8/94JGQYblP5x26Vo9l9QHB vJ292+9W/mAZMYVfEzSHYMS+6duo7wqgPK4WryHeA9Or8hiMXwF6+pIBzE3yKmrt/y4n ruN6hM/TibhI0bD1bfVyuaP+6o55KCXfNQdhVegeHErBpRyhgCeAqR+13ZcEcNH55/MA 8PZw== X-Gm-Message-State: ALoCoQmS8SEDJjgfeNLcr5vq98VkxNa4Go0uq+gDGd3jAyKOYqSyiBlKZkwvP6ShVtpmTqvfJflbL5SiBFR+8eFeonlm4UVEnhkY4EYo/EKpcs6za/uj+cJ1n00MZW3FoQaHGGW5vsof X-Received: by 10.52.142.102 with SMTP id rv6mr5289769vdb.26.1406900793934; Fri, 01 Aug 2014 06:46:33 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.52.142.102 with SMTP id rv6mr5289743vdb.26.1406900793707; Fri, 01 Aug 2014 06:46:33 -0700 (PDT) Received: by 10.221.19.197 with HTTP; Fri, 1 Aug 2014 06:46:33 -0700 (PDT) In-Reply-To: References: Date: Fri, 1 Aug 2014 09:46:33 -0400 Message-ID: Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration From: Bryan Beaudreault To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec51a7c4c0b282b04ff919e47 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec51a7c4c0b282b04ff919e47 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Also you shouldn't format the new standby. You only format a namenode for a brand new cluster. Once a cluster is live you should just use the bootstrap on the new namenodes and never format again. Bootstrap is basically a special format that just creates the dirs and copies an active fsimage to the host. If format fails (it's buggy imo) just rsync from the active namenode. It will catch up by replaying the edits from the QJM when it is started. On Friday, August 1, 2014, Bryan Beaudreault wrote: > You should first replace the namenode, then when that is completely > finished move on to replacing any journal nodes. That part is easy: > > 1) bootstrap new JN (rsync from an existing) > 2) Start new JN > 3) push hdfs-site.xml to both namenodes > 4) restart standby namenode > 5) verify logs and admin ui show new JN > 6) restart active namenode > 7) verify both namenodes (failover should have happened and old standby > should be writing to the new JN) > > You can remove an existing JN at the same time if you want, just be > careful to preserve the majority of the quorum during the whole operation > (I.e only replace 1 at a time). > > Also I think it is best to do hdfs dfsadmin -rollEdits after each replace= d > journalnode. IIRC there is a JIRA open about rolling restarting journal > nodes not being safe unless you roll edits. So that would go for replacin= g > too. > > On Friday, August 1, 2014, Colin Kincaid Williams > wrote: > >> I will run through the procedure again tomorrow. It was late in the day >> before I had a chance to test the procedure. >> >> If I recall correctly I had an issue formatting the New standby, before >> bootstrapping. I think either at that point, or during the Zookeeper >> format command, I was queried to format the journal to the 3 hosts in = the >> quorum.I was unable to proceed without exception unless choosing this >> option . >> >> Are there any concerns adding another journal node to the new standby? >> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" >> wrote: >> >>> This shouldn't have affected the journalnodes at all -- they are mostly >>> unaware of the zkfc and active/standby state. Did you do something els= e >>> that may have impacted the journalnodes? (i.e. shut down 1 or more of t= hem, >>> or something else) >>> >>> For your previous 2 emails, reporting errors/warns when doing -formatZK= : >>> >>> The WARN is fine. It's true that you could get in a weird state if you >>> had multiple namenodes up. But with just 1 namenode up, you should be >>> safe. What you are trying to avoid is a split brain or standby/standby >>> state, but that is impossible with just 1 namenode alive. Similarly, t= he >>> ERROR is a sanity check to make sure you don't screw yourself by format= ting >>> a running cluster. I should have mentioned you need to use the -force >>> argument to get around that. >>> >>> >>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams >>> wrote: >>> >>>> However continuing with the process my QJM eventually error'd out and >>>> my Active NameNode went down. >>>> >>>> 2014-07-31 20:59:33,944 WARN [Logger channel to rhel6.local/ >>>> 10.120.5.247:8485] client.QuorumJournalManager >>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 >>>> failed to write txns 9635-9635. Will try to write to this JN again aft= er >>>> the next log roll. >>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoc= h >>>> 5 is not the current writer epoch 0 >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journ= al.java:430) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:33= 1) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Jo= urnalNodeRpcServer.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideT= ranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJourn= alProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngi= ne.java:202) >>>> at com.sun.proxy.$Proxy9.journal(Unknown Source) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorP= B.journal(QJournalProtocolTranslatorPB.java:156) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLogg= erChannel.java:354) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLogg= erChannel.java:347) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 2014-07-31 20:59:33,954 WARN [Logger channel to rhel1.local/ >>>> 10.120.5.203:8485] client.QuorumJournalManager >>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 >>>> failed to write txns 9635-9635. Will try to write to this JN again aft= er >>>> the next log roll. >>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoc= h >>>> 5 is not the current writer epoch 0 >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journ= al.java:430) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:33= 1) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Jo= urnalNodeRpcServer.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideT= ranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJourn= alProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngi= ne.java:202) >>>> at com.sun.proxy.$Proxy9.journal(Unknown Source) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorP= B.journal(QJournalProtocolTranslatorPB.java:156) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLogg= erChannel.java:354) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLogg= erChannel.java:347) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 2014-07-31 20:59:33,975 WARN [Logger channel to rhel2.local/ >>>> 10.120.5.25:8485] client.QuorumJournalManager >>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 >>>> failed to write txns 9635-9635. Will try to write to this JN again aft= er >>>> the next log roll. >>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoc= h >>>> 5 is not the current writer epoch 0 >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journ= al.java:430) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:33= 1) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Jo= urnalNodeRpcServer.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideT= ranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJourn= alProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngi= ne.java:202) >>>> at com.sun.proxy.$Proxy9.journal(Unknown Source) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorP= B.journal(QJournalProtocolTranslatorPB.java:156) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLogg= erChannel.java:354) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLogg= erChannel.java:347) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] >>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) - >>>> Error: flush failed for required journal (JournalAndStream(mgr=3DQJM t= o [ >>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485], >>>> stream=3DQuorumOutputStream starting at txid 9634)) >>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many >>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown: >>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch 0 >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journ= al.java:430) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:33= 1) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Jo= urnalNodeRpcServer.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideT= ranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJourn= alProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> >>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch 0 >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journ= al.java:430) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:33= 1) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Jo= urnalNodeRpcServer.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideT= ranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJourn= alProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> >>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch 0 >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journ= al.java:430) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:33= 1) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(Jo= urnalNodeRpcServer.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideT= ranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJourn= alProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumEx= ception.java:81) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(Quo= rumCall.java:213) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuor= um(AsyncLoggerSet.java:142) >>>> at >>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync= (QuorumOutputStream.java:107) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditL= ogOutputStream.java:113) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditL= ogOutputStream.java:107) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStre= am$8.apply(JournalSet.java:490) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReport= Errors(JournalSet.java:350) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSe= t.java:55) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStre= am.flush(JournalSet.java:486) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.jav= a:581) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(= FSEditLog.java:946) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog= .java:884) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.jav= a:1013) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSName= system.java:4436) >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(N= ameNodeRpcServer.java:734) >>>> at >>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslator= PB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129) >>>> at >>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeP= rotocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call= (ProtobufRpcEngine.java:453) >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1332) >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>> 2014-07-31 20:59:33,976 WARN [IPC Server handler 5 on 8020] >>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Abor= ting >>>> QuorumOutputStream starting at txid 9634 >>>> 2014-07-31 20:59:33,978 INFO [IPC Server handler 5 on 8020] >>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1 >>>> 2014-07-31 20:59:33,982 INFO [Thread-0] namenode.NameNode >>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG: >>>> >>>> >>>> >>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams >>> > wrote: >>>> >>>>> I tried a third time and it just worked? >>>>> >>>>> sudo hdfs zkfc -formatZK >>>>> 2014-07-31 18:07:51,595 INFO [main] tools.DFSZKFailoverController >>>>> (DFSZKFailoverController.java:(140)) - Failover controller conf= igured >>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020 >>>>> 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client >>>>> environment:zookeeper.version=3D3.4.3-cdh4.1.3--1, built on 01/27/201= 3 00:13 >>>>> GMT >>>>> 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:host.name >>>>> =3Drhel1.local >>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:java.version=3D1.= 7.0_60 >>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=3DOra= cle >>>>> Corporation >>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client >>>>> environment:java.home=3D/usr/java/jdk1.7.0_60/jre >>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client >>>>> environment:java.class.path=3D/etc/hadoop/conf:/usr/lib/hadoop/lib/je= rsey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/l= ib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/= usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib= /hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoo= p/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/l= ib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.j= ar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collecti= ons-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoo= p/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/us= r/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/= usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-ser= ver-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/= lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/h= adoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/li= b/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-= 3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper= -runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/h= adoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.= 8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/= activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/us= r/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-la= ng-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/= junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/= commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/had= oop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/ha= doop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/us= r/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.= 1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-a= nnotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib= /hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop= -common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/l= ib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.= jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/u= sr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson= -mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-= hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.ja= r:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6= .1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hado= op-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.= 2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop= -hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/us= r/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-= 0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs= /lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/= usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/= jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.= jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/= lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.= 3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib= /hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.ja= r:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yar= n/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/h= adoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-g= uice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn= /lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.j= ar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-= io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yar= n/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/= usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/para= namer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/ha= doop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-= yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-y= arn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.= 0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-= yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nod= emanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/= hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-= yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-a= pplications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//had= oop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop= -yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-= yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-= web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.= 0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cd= h4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh= 4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.= 20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-m= apreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectj= rt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-= 0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3= .2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-= mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-l= ogging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce= /lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api= -2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/= usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoo= p-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/se= rvlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.2= 0-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ja= sper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1= .8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/had= oop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/li= b/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjt= ools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-map= reduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/com= mons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.ja= r:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.= 20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activatio= n-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.j= ar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/= hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/li= b/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.ja= r:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.2= 0-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/moc= kito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapr= educe/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-= 2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapred= uce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fai= rscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-= examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.= 0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0= -mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib= /hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoo= p-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoo= p-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.= 3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.= jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/u= sr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/li= b/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapredu= ce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/./= /hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoo= p-core.jar >>>>> 2014-07-31 18:07:51,793 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client >>>>> environment:java.library.path=3D//usr/lib/hadoop/lib/native >>>>> 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=3D= /tmp >>>>> 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=3D<= NA> >>>>> 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:os.name=3DLinux >>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=3Damd64 >>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client >>>>> environment:os.version=3D2.6.32-358.el6.x86_64 >>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:user.name=3Droot >>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client environment:user.home=3D/root >>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>> (Environment.java:logEnv(100)) - Client >>>>> environment:user.dir=3D/etc/hbase/conf.golden_apple >>>>> 2014-07-31 18:07:51,813 INFO [main] zookeeper.ZooKeeper >>>>> (ZooKeeper.java:(433)) - Initiating client connection, >>>>> connectString=3Drhel1.local:2181,rhel6.local:2181,rhel2.local:2181 >>>>> sessionTimeout=3D5000 watcher=3Dnull >>>>> 2014-07-31 18:07:51,833 INFO [main-SendThread(rhel1.local:2181)] >>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening >>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not >>>>> attempt to authenticate using SASL (unknown error) >>>>> 2014-07-31 18:07:51,844 INFO [main-SendThread(rhel1.local:2181)] >>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket >>>>> connection established to rhel1.local/10.120.5.203:2181, initiating >>>>> session >>>>> 2014-07-31 18:07:51,852 INFO [main-SendThread(rhel1.local:2181)] >>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session >>>>> establishment complete on server rhel1.local/10.120.5.203:2181, >>>>> sessionid =3D 0x1478902fddc000a, negotiated timeout =3D 5000 >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> The configured parent znode /hadoop-ha/golden-apple already exists. >>>>> Are you sure you want to clear all failover information from >>>>> ZooKeeper? >>>>> WARNING: Before proceeding, ensure that all HDFS services and >>>>> failover controllers are stopped! >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31 >>>>> 18:07:51,858 INFO [main-EventThread] ha.ActiveStandbyElector >>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connecte= d. >>>>> Y >>>>> 2014-07-31 18:08:00,439 INFO [main] ha.ActiveStandbyElector >>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively delet= ing >>>>> /hadoop-ha/golden-apple from ZK... >>>>> 2014-07-31 18:08:00,506 INFO [main] ha.ActiveStandbyElector >>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully dele= ted >>>>> /hadoop-ha/golden-apple from ZK. >>>>> 2014-07-31 18:08:00,541 INFO [main] ha.ActiveStandbyElector >>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully cre= ated >>>>> /hadoop-ha/golden-apple in ZK. >>>>> 2014-07-31 18:08:00,545 INFO [main-EventThread] zookeeper.ClientCnxn >>>>> (ClientCnxn.java:run(511)) - EventThread shut down >>>>> 2014-07-31 18:08:00,545 INFO [main] zookeeper.ZooKeeper >>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed >>>>> >>>>> >>>>> >>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman >>>>> wrote: >>>>> >>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO. >>>>>> >>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams < >>>>>> discord@uw.edu> wrote: >>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help >>>>>> earlier today. >>>>>> > Just thought I'd forward this info regarding swapping out the >>>>>> NameNode in a >>>>>> > QJM / HA configuration. See you around on #hbase. If you visit >>>>>> Seattle, feel >>>>>> > free to give me a shout out. >>>>>> > >>>>>> > ---------- Forwarded message ---------- >>>>>> > From: Colin Kincaid Williams >>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM >>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM >>>>>> / HA >>>>>> > configuration >>>>>> > To: user@hadoop.apache.org >>>>>> > >>>>>> > >>>>>> > Hi Jing, >>>>>> > >>>>>> > Thanks for the response. I will try this out, and file an Apache >>>>>> jira. >>>>>> > >>>>>> > Best, >>>>>> > >>>>>> > Colin Williams >>>>>> > >>>>>> > >>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao >>>>>> wrote: >>>>>> >> >>>>>> >> Hi Colin, >>>>>> >> >>>>>> >> I guess currently we may have to restart almost all the >>>>>> >> daemons/services in order to swap out a standby NameNode (SBN): >>>>>> >> >>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN >>>>>> since in >>>>>> >> the current implementation the SBN tries to send rollEditLog RPC >>>>>> request to >>>>>> >> ANN periodically (thus if a NN failover happens later, the >>>>>> original ANN >>>>>> >> needs to send this RPC to the correct NN). >>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment >>>>>> for NN. >>>>>> >> Look at the code in BPOfferService: >>>>>> >> >>>>>> >> void refreshNNList(ArrayList addrs) throws >>>>>> >> IOException { >>>>>> >> Set oldAddrs =3D Sets.newHashSet(); >>>>>> >> for (BPServiceActor actor : bpServices) { >>>>>> >> oldAddrs.add(actor.getNNSocketAddress()); >>>>>> >> } >>>>>> >> Set newAddrs =3D Sets.newHashSet(addrs); >>>>>> >> >>>>>> >> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) = { >>>>>> >> // Keep things simple for now -- we can implement this at a >>>>>> later >>>>>> >> date. >>>>>> >> throw new IOException( >>>>>> >> "HA does not currently support adding a new standby to = a >>>>>> running >>>>>> >> DN. " + >>>>>> >> "Please do a rolling restart of DNs to reconfigure the >>>>>> list of >>>>>> >> NNs."); >>>>>> >> } >>>>>> >> } >>>>>> >> >>>>>> >> 3. If you're using automatic failover, you also need to update th= e >>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC >>>>>> will do >>>>>> >> gracefully fencing by sending RPC to the other NN. >>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new >>>>>> SBN but I >>>>>> >> have not tried before. >>>>>> >> >>>>>> >> Thus in general we may still have to restart all the services >>>>>> (except >>>>>> >> JNs) and update their configurations. But this may be a rolling >>>>>> restart >>>>>> >> process I guess: >>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new >>>>>> SBN. >>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling >>>>>> restart >>>>>> >> of all the DN to update their configurations >>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update >>>>>> their >>>>>> >> configuration. The new SBN should become active. >>>>>> >> >>>>>> >> I have not tried the upper steps, thus please let me know if >>>>>> this >>>>>> >> works or not. And I think we should also document the correct >>>>>> steps in >>>>>> >> Apache. Could you please file an Apache jira? >>>>>> >> >>>>>> >> Thanks, >>>>>> >> -Jing >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams < >>>>>> discord@uw.edu> >>>>>> >> wrote: >>>>>> >>> >>>>>> >>> Hello, >>>>>> >>> >>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA >>>>>> configuration. I >>>>>> >>> believe the steps to achieve this would be something similar to: >>>>>> >>> >>>>>> >>> Use the Bootstrap standby command to prep the replacment standby= . >>>>>> Or >>>>>> >>> rsync if the command fails. >>>>>> >>> >>>>>> >>> Somehow update the datanodes, so they push the heartbeat / >>>>>> journal to the >>>>>> >>> new standby >>>>>> >>> >>>>>> >>> Update the xml configuration on all nodes to reflect the >>>>>> replacment >>>>>> >>> standby. >>>>>> >>> >>>>>> >>> Start the replacment standby >>>>>> >>> >>>>>> >>> Use some hadoop command to refresh the datanodes to the new >>>>>> NameNode >>>>>> >>> configuration. >>>>>> >>> >>>>>> >>> I am not sure how to deal with the Journal switch, or if I am >>>>>> going about >>>>>> >>> this the right way. Can anybody give me some suggestions here? >>>>>> >>> >>>>>> >>> >>>>>> >>> Regards, >>>>>> >>> >>>>>> >>> Colin Williams >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> CONFIDENTIALITY NOTICE >>>>>> >> NOTICE: This message is intended for the use of the individual or >>>>>> entity >>>>>> >> to which it is addressed and may contain information that is >>>>>> confidential, >>>>>> >> privileged and exempt from disclosure under applicable law. If th= e >>>>>> reader of >>>>>> >> this message is not the intended recipient, you are hereby >>>>>> notified that any >>>>>> >> printing, copying, dissemination, distribution, disclosure or >>>>>> forwarding of >>>>>> >> this communication is strictly prohibited. If you have received >>>>>> this >>>>>> >> communication in error, please contact the sender immediately and >>>>>> delete it >>>>>> >> from your system. Thank You. >>>>>> > >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> >>> --bcaec51a7c4c0b282b04ff919e47 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Also you shouldn't format the new standby. You only format a namenode f= or a brand new cluster. Once a cluster is live you should just use the boot= strap on the new namenodes and never format again. Bootstrap is basically a= special format that just creates the dirs and copies an active fsimage to = the host.=C2=A0

If format fails (it's buggy imo) just rsync from the act= ive namenode. It will catch up by replaying the edits from the QJM when it = is started.=C2=A0

On Friday, August 1, 2014, Bryan= Beaudreault <bbeaudreault@h= ubspot.com> wrote:
You should first replace the namenode, then = when that is completely finished move on to replacing any journal nodes. Th= at part is easy:

1) bootstrap new JN (rsync from an existing)
2) Start new JN
3) push hdfs-site.xml to both namenodes
4) restart standby namenode
5) verify logs and admin ui show new= JN
6) restart active namenode
7) verify both namenodes= (failover should have happened and old standby should be writing to the ne= w JN)

You can remove an existing JN at the same time if you w= ant, just be careful to preserve the majority of the quorum during the whol= e operation (I.e only replace 1 at a time).=C2=A0

Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced = journalnode. IIRC there is a JIRA open about rolling restarting journal nod= es not being safe unless you roll edits. So that would go for replacing too= .=C2=A0

On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu> wrote:

I will run through the procedure again tomorrow. It was late= in the day before I had a chance to test the procedure.=C2=A0

If I recall correctly I had an issue formatting the New stan= dby, before bootstrapping.=C2=A0 I think either at that point, or during th= e Zookeeper format command,=C2=A0 I was queried=C2=A0 to format the journal= to the 3 hosts in the quorum.I was unable to proceed without exception unl= ess choosing this option .

Are there any concerns adding another journal node to the ne= w standby?

On Jul 31, 2014 9:44 PM, "Bryan Beaudreault= " <bbeaudreault@hubspot.com> wrote:
This shouldn't have affected the journalnodes at all -= - they are mostly unaware of the zkfc and active/standby state. =C2=A0Did y= ou do something else that may have impacted the journalnodes? (i.e. shut do= wn 1 or more of them, or something else)

For your previous 2 emails, reporting errors/warns when doin= g -formatZK:

The WARN is fine. =C2=A0It's true= that you could get in a weird state if you had multiple namenodes up. =C2= =A0But with just 1 namenode up, you should be safe. What you are trying to = avoid is a split brain or standby/standby state, but that is impossible wit= h just 1 namenode alive. =C2=A0Similarly, the ERROR is a sanity check to ma= ke sure you don't screw yourself by formatting a running cluster. =C2= =A0I should have mentioned you need to use the -force argument to get aroun= d that.


On Fri,= Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <d= iscord@uw.edu> wrote:
However continuing with the= process my QJM eventually error'd out and my Active NameNode went down= .

2014-07-31 20:59:33,944 WARN =C2=A0[Logger channel to r= hel6.local/10.120.5.= 247:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357))= - Remote journal 10= .120.5.247:8485 failed to write txns 9635-9635. Will try to write to th= is JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server.Journ= al.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs= .qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjournal.server.Jo= urnalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoop.hdfs= .qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJourna= lProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalP= rotocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocol= Protos.java:14018)
at org.apache.hadoop.ipc.= ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453= )
at org.apache.hado= op.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.jav= a:1689)
at java.security.AccessCo= ntroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.secu= rity.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$= Handler.run(Server.java:1687)

at org.apa= che.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker= .invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.= journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoop.hdfs= .qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.hadoop.hdfs.q= journal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.F= utureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP= oolExecutor.java:1145)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:7= 45)
2014-07-31 20:59:33,954 WARN =C2=A0[Logger channel to rhel1.local/10.120.5.203:8485] c= lient.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journ= al 10.120.5.203:8485= failed to write txns 9635-9635. Will try to write to this JN again aft= er the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server.Journ= al.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs= .qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjournal.server.Jo= urnalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoop.hdfs= .qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJourna= lProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalP= rotocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocol= Protos.java:14018)
at org.apache.hadoop.ipc.= ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453= )
at org.apache.hado= op.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.jav= a:1689)
at java.security.AccessCo= ntroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.secu= rity.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$= Handler.run(Server.java:1687)

at org.apa= che.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker= .invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.= journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoop.hdfs= .qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.hadoop.hdfs.q= journal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.F= utureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP= oolExecutor.java:1145)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:7= 45)
2014-07-31 20:59:33,975 WARN =C2=A0[Logger channel to rhel2.local/10.120.5.25:8485] cli= ent.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal= 10.120.5.25:8485= failed to write txns 9635-9635. Will try to write to this JN again after t= he next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server.Journ= al.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs= .qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjournal.server.Jo= urnalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoop.hdfs= .qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJourna= lProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalP= rotocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocol= Protos.java:14018)
at org.apache.hadoop.ipc.= ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453= )
at org.apache.hado= op.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.jav= a:1689)
at java.security.AccessCo= ntroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.secu= rity.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$= Handler.run(Server.java:1687)

at org.apa= che.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker= .invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.= journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoop.hdfs= .qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.hadoop.hdfs.q= journal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.F= utureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP= oolExecutor.java:1145)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:7= 45)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] namenode.= FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) - Error: flush = failed for required journal (JournalAndStream(mgr=3DQJM to [10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485], strea= m=3DQuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many e= xceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC&#= 39;s epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs= .qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal= .server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs= .qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142= )
at org.apache.hado= op.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(= QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs= .qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlo= ckingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Serve= r$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs= .qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal= .server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs= .qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142= )
at org.apache.hado= op.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(= QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs= .qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlo= ckingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Serve= r$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs= .qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal= .server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs= .qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142= )
at org.apache.hado= op.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(= QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs= .qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlo= ckingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Serve= r$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)

at org.apache.hadoop.hdfs.qjournal.client= .QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs= .qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at org.apache.hadoop.hdfs.qjo= urnal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at org.apache.hadoop.hdfs= .qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:10= 7)
at org.apache.had= oop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java= :113)
at org.apache.hadoop.hdfs= .server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.h= dfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.ja= va:490)
at org.apache.hadoop.hdfs= .server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)=
at org.apache.hadoo= p.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs= .server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:48= 6)
at org.apache.had= oop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
at org.apache.hadoop.hdfs= .server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at org.apache.hadoop.hdfs.s= erver.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at org.apache.hadoop.hdfs= .server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at org.apache.hadoop.hdfs.server.nameno= de.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at org.apache.hadoop.hdfs= .server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)<= /div>
at org.apache.hadoop= .hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(Namenod= eProtocolServerSideTranslatorPB.java:129)
at org.apache.hadoop.hdfs= .protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlocki= ngMethod(NamenodeProtocolProtos.java:8762)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$Pr= otoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,976 WARN= =C2=A0[IPC Server handler 5 on 8020] client.QuorumJournalManager (QuorumOu= tputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 9= 634
2014-07-31 20:59:33,978 INFO =C2=A0[IPC Server handler 5 on 8020] util= .ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
2= 014-07-31 20:59:33,982 INFO =C2=A0[Thread-0] namenode.NameNode (StringUtils= .java:run(615)) - SHUTDOWN_MSG:=C2=A0



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Will= iams <discord@uw.edu> wrote:
I tried a third time and it= just worked?

sudo hdfs zkfc -formatZK
20= 14-07-31 18:07:51,595 INFO =C2=A0[main] tools.DFSZKFailoverController (DFSZ= KFailoverController.java:<init>(140)) - Failover controller configure= d for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:zookeeper.version=3D3.4.3-cdh4.= 1.3--1, built on 01/27/2013 00:13 GMT
2014-07-31 18:07:51,791 INF= O =C2=A0[main] zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client = environment:host.name=3D= rhel1.local
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.version=3D1.7.0_60
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environme= nt.java:logEnv(100)) - Client environment:java.vendor=3DOracle Corporation<= /div>
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.home=3D/usr/java/jdk1.7.0_= 60/jre
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKee= per (Environment.java:logEnv(100)) - Client environment:java.class.path=3D/= etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib= /commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/had= oop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/u= sr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hado= op/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/= lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/li= b/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.ja= r:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jac= kson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoo= p/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:= /usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.clouder= a.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jaspe= r-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/= hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib= /hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.= 6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-x= c-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/l= ib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/li= b/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-h= ttpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/li= b/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.= 7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy= -java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/s= tax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/= lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/had= oop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/h= adoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:= /usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-cod= ec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/li= b/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1= .3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/h= adoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1= .3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/= hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.= 2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop= -hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/ha= doop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.5= 2.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-= hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.ja= r:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookee= per-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/li= b/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1= .8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/c= ommons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hado= op-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtim= e-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr= /lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf= -java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/ha= doop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-= hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/= lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-ma= pper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yar= n/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8= .jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib= /log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/l= ib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0= .jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/a= opalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/had= oop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet= -3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/li= b/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications= -distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.= 3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/= hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.= //hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib= /hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//= hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tes= ts-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar= :/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cd= h4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.= 1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tes= ts.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/= lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/./= /hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn= /.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//ha= doop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//had= oop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-= mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-co= ntrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-map= reduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson= -mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce= /lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20= -mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0= .52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jac= kson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.2= 0-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.= 7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/had= oop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-m= apreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson= -xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:= /usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop= -0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons= -httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapred= uce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib= /commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lan= g-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/ha= doop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:= /usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.= 20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-a= pi-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduc= e/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-js= on-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/us= r/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/= lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/had= oop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.2= 0-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.= 0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/= usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapr= educe/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/= .//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/./= /hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoo= p-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-exa= mples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools= .jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/= usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.library.path=3D//usr/lib/h= adoop/lib/native
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookee= per.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.t= mpdir=3D/tmp
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.compiler=3D<NA>
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookeeper.ZooKeeper (Enviro= nment.java:logEnv(100)) - Client environment:os.name=3DLinux
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:os.arch=3Damd64
2014-= 07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environment.java:= logEnv(100)) - Client environment:os.version=3D2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:user.name=3Droot
2014-07-31 18:07:51,802 INFO= =C2=A0[main] zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client e= nvironment:user.home=3D/root
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:user.dir=3D/etc/hbase/conf.gold= en_apple
2014-07-31 18:07:51,813 INFO =C2=A0[main] zookeeper.ZooK= eeper (ZooKeeper.java:<init>(433)) - Initiating client connection, co= nnectString=3Drhel1.local:2181,rhel6.local:2181,rhel2.local:2181 sessionTim= eout=3D5000 watcher=3Dnull
2014-07-31 18:07:51,833 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening sock= et connection to server rhel1.local/10.120.5.203:2181. Will not attempt to authenticate usi= ng SASL (unknown error)
2014-07-31 18:07:51,844 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket conne= ction established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session establi= shment complete on server rhel1.local/10.120.5.203:2181, sessionid =3D 0x1478902fddc000a, n= egotiated timeout =3D 5000
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
The configured parent znode /hadoop-ha/golden-apple already exists.=
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Proceed formatting = /hadoop-ha/golden-apple? (Y or N) 2014-07-31 18:07:51,858 INFO =C2=A0[main-= EventThread] ha.ActiveStandbyElector (ActiveStandbyElector.java:processWatc= hEvent(538)) - Session connected.
Y=C2=A0
2014-07-31 18:08:00,439 INFO =C2=A0[main] ha.ActiveS= tandbyElector (ActiveStandbyElector.java:clearParentZNode(314)) - Recursive= ly deleting /hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:0= 0,506 INFO =C2=A0[main] ha.ActiveStandbyElector (ActiveStandbyElector.java:= clearParentZNode(327)) - Successfully deleted /hadoop-ha/golden-apple from = ZK.
2014-07-31 18:08:00,541 INFO =C2=A0[main] ha.ActiveStandbyElector (Act= iveStandbyElector.java:ensureParentZNode(299)) - Successfully created /hado= op-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO =C2=A0[mai= n-EventThread] zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThrea= d shut down
2014-07-31 18:08:00,545 INFO =C2=A0[main] zookeeper.ZooKeeper (ZooKeep= er.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix= 4e@gmail.com> wrote:
Cheers. That's rough. We don't have that problem h= ere at WanDISCO.

On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <discord@uw.= edu> wrote:
> Hi this is drocsid / discord from #hbase. Thanks for the help earlier = today.
> Just thought I'd forward this info regarding swapping out the Name= Node in a
> QJM / HA configuration. See you around on #hbase. If you visit Seattle= , feel
> free to give me a shout out.
>
> ---------- Forwarded message ----------
> From: Colin Kincaid Williams <discord@uw.edu>
> Date: Thu, Jul 31, 2014 at 12:35 PM
> Subject: Re: Juggling or swaping out the standby NameNode in a QJM / H= A
> configuration
> To: user@hadoop.apache.org
>
>
> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.=
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.co= m> wrote:
>>
>> Hi Colin,
>>
>> =C2=A0 =C2=A0 I guess currently we may have to restart almost all = the
>> daemons/services in order to swap out a standby NameNode (SBN): >>
>> 1. The current active NameNode (ANN) needs to know the new SBN sin= ce in
>> the current implementation the SBN tries to send rollEditLog RPC r= equest to
>> ANN periodically (thus if a NN failover happens later, the origina= l ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment fo= r NN.
>> Look at the code in BPOfferService:
>>
>> =C2=A0 void refreshNNList(ArrayList<InetSocketAddress> addrs= ) throws
>> IOException {
>> =C2=A0 =C2=A0 Set<InetSocketAddress> oldAddrs =3D Sets.newHa= shSet();
>> =C2=A0 =C2=A0 for (BPServiceActor actor : bpServices) {
>> =C2=A0 =C2=A0 =C2=A0 oldAddrs.add(actor.getNNSocketAddress());
>> =C2=A0 =C2=A0 }
>> =C2=A0 =C2=A0 Set<InetSocketAddress> newAddrs =3D Sets.newHa= shSet(addrs);
>>
>> =C2=A0 =C2=A0 if (!Sets.symmetricDifference(oldAddrs, newAddrs).is= Empty()) {
>> =C2=A0 =C2=A0 =C2=A0 // Keep things simple for now -- we can imple= ment this at a later
>> date.
>> =C2=A0 =C2=A0 =C2=A0 throw new IOException(
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "HA does not currently sup= port adding a new standby to a running
>> DN. " +
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Please do a rolling resta= rt of DNs to reconfigure the list of
>> NNs.");
>> =C2=A0 =C2=A0 }
>> =C2=A0 }
>>
>> 3. If you're using automatic failover, you also need to update= the
>> configuration of the ZKFC on the current ANN machine, since ZKFC w= ill do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new S= BN but I
>> have not tried before.
>>
>> =C2=A0 =C2=A0 Thus in general we may still have to restart all the= services (except
>> JNs) and update their configurations. But this may be a rolling re= start
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new = SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling r= estart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update = their
>> configuration. The new SBN should become active.
>>
>> =C2=A0 =C2=A0 I have not tried the upper steps, thus please let me= know if this
>> works or not. And I think we should also document the correct step= s in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <dis= cord@uw.edu>
>> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA co= nfiguration. I
>>> believe the steps to achieve this would be something similar t= o:
>>>
>>> Use the Bootstrap standby command to prep the replacment stand= by. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / jou= rnal to the
>>> new standby
>>>
>>> Update the xml configuration on all nodes to reflect the repla= cment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new Na= meNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am = going about
>>> this the right way. Can anybody give me some suggestions here?=
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or = entity
>> to which it is addressed and may contain information that is confi= dential,
>> privileged and exempt from disclosure under applicable law. If the= reader of
>> this message is not the intended recipient, you are hereby notifie= d that any
>> printing, copying, dissemination, distribution, disclosure or forw= arding of
>> this communication is strictly prohibited. If you have received th= is
>> communication in error, please contact the sender immediately and = delete it
>> from your system. Thank You.
>
>
>



--bcaec51a7c4c0b282b04ff919e47--