Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 042FD11AA6 for ; Fri, 1 Aug 2014 14:27:25 +0000 (UTC) Received: (qmail 91760 invoked by uid 500); 1 Aug 2014 14:27:15 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 91649 invoked by uid 500); 1 Aug 2014 14:27:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 91639 invoked by uid 99); 1 Aug 2014 14:27:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 14:27:15 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bbeaudreault@hubspot.com designates 74.125.149.145 as permitted sender) Received: from [74.125.149.145] (HELO na3sys009aog121.obsmtp.com) (74.125.149.145) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 14:27:09 +0000 Received: from mail-vc0-f173.google.com ([209.85.220.173]) (using TLSv1) by na3sys009aob121.postini.com ([74.125.148.12]) with SMTP ID DSNKU9ujqG5bWyL1WTQ7SJ3NJj/RPRLmd+SR@postini.com; Fri, 01 Aug 2014 07:26:48 PDT Received: by mail-vc0-f173.google.com with SMTP id hy10so6741012vcb.32 for ; Fri, 01 Aug 2014 07:26:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=85UtoOhM6Gyu0xXblnomi2gNuKY1h8ngctBpC/mcYpg=; b=h7/UT9vswWvplJOiASS5z/fLdXeb/Vh8Us2aYYa8bl/WwIIDUWs4E5t3MgQb+k98vw MFNksGbZgZmBpp5FLOczYix/v1AK6e/iBQiEphmdJ4ObYyCzIvqrJpXAjKJx7SDlxoIE YrtX6wJBvM+O78HAVrwukgUt+0fJNifY9J9NUQLoYNZNav4RK3nm5WdlZvOBOig+xjSB AU2I3HuRoX0Io/6qIUC05NvH6CMuAs5aDlfHgNIR6gN0PAlM+mCqvyYA1+gFW0HE8OAl gdxFVng6DkjQ09xXzHDFMvUUcYI/qB9ig1Xyv0HV1Gb+UJ5OOo8ihQmLnO2HrKW1nacA K4tg== X-Gm-Message-State: ALoCoQm2ehG4O3VXof7pw7DL/RxBGoOS3EIoXaEJhxMb0hMx7dnxrFeL/NRrZ8dhmu9HEkRURhJvsETx6gD1D9XOfrWJgRg3C/7tF+ChU2n2Emd0i6KGNIbSED1gv0xVrRdByT7Pu9lN X-Received: by 10.52.227.72 with SMTP id ry8mr3850603vdc.64.1406903207308; Fri, 01 Aug 2014 07:26:47 -0700 (PDT) X-Received: by 10.52.227.72 with SMTP id ry8mr3850558vdc.64.1406903206923; Fri, 01 Aug 2014 07:26:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.221.19.197 with HTTP; Fri, 1 Aug 2014 07:26:26 -0700 (PDT) In-Reply-To: References: From: Bryan Beaudreault Date: Fri, 1 Aug 2014 10:26:26 -0400 Message-ID: Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e011765b9e1da8104ff922d25 X-Virus-Checked: Checked by ClamAV on apache.org --089e011765b9e1da8104ff922d25 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable No worries! Glad you had a test environment to play with this in. Also, above I meant "If bootstrap fails...", not format of course :) On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams wrote: > I realize that this was a foolish error made late in the day. I am no > hadoop expert, and have much to learn. This is why I setup a test > environment. > On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" > wrote: > >> Also you shouldn't format the new standby. You only format a namenode fo= r >> a brand new cluster. Once a cluster is live you should just use the >> bootstrap on the new namenodes and never format again. Bootstrap is >> basically a special format that just creates the dirs and copies an acti= ve >> fsimage to the host. >> >> If format fails (it's buggy imo) just rsync from the active namenode. It >> will catch up by replaying the edits from the QJM when it is started. >> >> On Friday, August 1, 2014, Bryan Beaudreault >> wrote: >> >>> You should first replace the namenode, then when that is completely >>> finished move on to replacing any journal nodes. That part is easy: >>> >>> 1) bootstrap new JN (rsync from an existing) >>> 2) Start new JN >>> 3) push hdfs-site.xml to both namenodes >>> 4) restart standby namenode >>> 5) verify logs and admin ui show new JN >>> 6) restart active namenode >>> 7) verify both namenodes (failover should have happened and old standby >>> should be writing to the new JN) >>> >>> You can remove an existing JN at the same time if you want, just be >>> careful to preserve the majority of the quorum during the whole operati= on >>> (I.e only replace 1 at a time). >>> >>> Also I think it is best to do hdfs dfsadmin -rollEdits after each >>> replaced journalnode. IIRC there is a JIRA open about rolling restartin= g >>> journal nodes not being safe unless you roll edits. So that would go fo= r >>> replacing too. >>> >>> On Friday, August 1, 2014, Colin Kincaid Williams >>> wrote: >>> >>>> I will run through the procedure again tomorrow. It was late in the da= y >>>> before I had a chance to test the procedure. >>>> >>>> If I recall correctly I had an issue formatting the New standby, befor= e >>>> bootstrapping. I think either at that point, or during the Zookeeper >>>> format command, I was queried to format the journal to the 3 hosts i= n the >>>> quorum.I was unable to proceed without exception unless choosing this >>>> option . >>>> >>>> Are there any concerns adding another journal node to the new standby? >>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" >>>> wrote: >>>> >>>>> This shouldn't have affected the journalnodes at all -- they are >>>>> mostly unaware of the zkfc and active/standby state. Did you do some= thing >>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or mo= re of >>>>> them, or something else) >>>>> >>>>> For your previous 2 emails, reporting errors/warns when doing >>>>> -formatZK: >>>>> >>>>> The WARN is fine. It's true that you could get in a weird state if >>>>> you had multiple namenodes up. But with just 1 namenode up, you shou= ld be >>>>> safe. What you are trying to avoid is a split brain or standby/standb= y >>>>> state, but that is impossible with just 1 namenode alive. Similarly,= the >>>>> ERROR is a sanity check to make sure you don't screw yourself by form= atting >>>>> a running cluster. I should have mentioned you need to use the -forc= e >>>>> argument to get around that. >>>>> >>>>> >>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams < >>>>> discord@uw.edu> wrote: >>>>> >>>>>> However continuing with the process my QJM eventually error'd out an= d >>>>>> my Active NameNode went down. >>>>>> >>>>>> 2014-07-31 20:59:33,944 WARN [Logger channel to rhel6.local/ >>>>>> 10.120.5.247:8485] client.QuorumJournalManager >>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 >>>>>> failed to write txns 9635-9635. Will try to write to this JN again a= fter >>>>>> the next log roll. >>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's >>>>>> epoch 5 is not the current writer epoch 0 >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Jou= rnal.java:430) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:= 331) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(= JournalNodeRpcServer.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSid= eTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJou= rnalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEn= gine.java:202) >>>>>> at com.sun.proxy.$Proxy9.journal(Unknown Source) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLo= ggerChannel.java:354) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLo= ggerChannel.java:347) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor= .java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto= r.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> 2014-07-31 20:59:33,954 WARN [Logger channel to rhel1.local/ >>>>>> 10.120.5.203:8485] client.QuorumJournalManager >>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 >>>>>> failed to write txns 9635-9635. Will try to write to this JN again a= fter >>>>>> the next log roll. >>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's >>>>>> epoch 5 is not the current writer epoch 0 >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Jou= rnal.java:430) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:= 331) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(= JournalNodeRpcServer.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSid= eTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJou= rnalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEn= gine.java:202) >>>>>> at com.sun.proxy.$Proxy9.journal(Unknown Source) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLo= ggerChannel.java:354) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLo= ggerChannel.java:347) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor= .java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto= r.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> 2014-07-31 20:59:33,975 WARN [Logger channel to rhel2.local/ >>>>>> 10.120.5.25:8485] client.QuorumJournalManager >>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 >>>>>> failed to write txns 9635-9635. Will try to write to this JN again a= fter >>>>>> the next log roll. >>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's >>>>>> epoch 5 is not the current writer epoch 0 >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Jou= rnal.java:430) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:= 331) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(= JournalNodeRpcServer.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSid= eTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJou= rnalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEn= gine.java:202) >>>>>> at com.sun.proxy.$Proxy9.journal(Unknown Source) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLo= ggerChannel.java:354) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLo= ggerChannel.java:347) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor= .java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto= r.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] >>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355))= - >>>>>> Error: flush failed for required journal (JournalAndStream(mgr=3DQJM= to [ >>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485], >>>>>> stream=3DQuorumOutputStream starting at txid 9634)) >>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many >>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown: >>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch 0 >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Jou= rnal.java:430) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:= 331) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(= JournalNodeRpcServer.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSid= eTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJou= rnalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> >>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch 0 >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Jou= rnal.java:430) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:= 331) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(= JournalNodeRpcServer.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSid= eTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJou= rnalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> >>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch 0 >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Jou= rnal.java:430) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:= 331) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(= JournalNodeRpcServer.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSid= eTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJou= rnalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(Quorum= Exception.java:81) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(Q= uorumCall.java:213) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQu= orum(AsyncLoggerSet.java:142) >>>>>> at >>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSy= nc(QuorumOutputStream.java:107) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(Edi= tLogOutputStream.java:113) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(Edi= tLogOutputStream.java:107) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputSt= ream$8.apply(JournalSet.java:490) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndRepo= rtErrors(JournalSet.java:350) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(Journal= Set.java:55) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputSt= ream.flush(JournalSet.java:486) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.j= ava:581) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegmen= t(FSEditLog.java:946) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditL= og.java:884) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.j= ava:1013) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNa= mesystem.java:4436) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog= (NameNodeRpcServer.java:734) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslat= orPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$Namenod= eProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca= ll(ProtobufRpcEngine.java:453) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1332) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >>>>>> 2014-07-31 20:59:33,976 WARN [IPC Server handler 5 on 8020] >>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Ab= orting >>>>>> QuorumOutputStream starting at txid 9634 >>>>>> 2014-07-31 20:59:33,978 INFO [IPC Server handler 5 on 8020] >>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1 >>>>>> 2014-07-31 20:59:33,982 INFO [Thread-0] namenode.NameNode >>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG: >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams < >>>>>> discord@uw.edu> wrote: >>>>>> >>>>>>> I tried a third time and it just worked? >>>>>>> >>>>>>> sudo hdfs zkfc -formatZK >>>>>>> 2014-07-31 18:07:51,595 INFO [main] tools.DFSZKFailoverController >>>>>>> (DFSZKFailoverController.java:(140)) - Failover controller co= nfigured >>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020 >>>>>>> 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client >>>>>>> environment:zookeeper.version=3D3.4.3-cdh4.1.3--1, built on 01/27/2= 013 00:13 >>>>>>> GMT >>>>>>> 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name >>>>>>> =3Drhel1.local >>>>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=3D= 1.7.0_60 >>>>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=3DO= racle >>>>>>> Corporation >>>>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client >>>>>>> environment:java.home=3D/usr/java/jdk1.7.0_60/jre >>>>>>> 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client >>>>>>> environment:java.class.path=3D/etc/hadoop/conf:/usr/lib/hadoop/lib/= jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop= /lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar= :/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/l= ib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/had= oop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr= /lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2= .jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collec= tions-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/had= oop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/= usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar= :/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-s= erver-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoo= p/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib= /hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/= lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-ne= t-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasp= er-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib= /hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-= 1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/li= b/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/= usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-= lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/li= b/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/li= b/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/h= adoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/= hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/= usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.= 6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop= -annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/l= ib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hado= op-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr= /lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-aut= h.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:= /usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jacks= on-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoo= p-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.= jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty= -6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/ha= doop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.= 0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hado= op-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/= usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jlin= e-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hd= fs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar= :/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/li= b/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.= 5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdf= s/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.= 1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/l= ib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.= jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-y= arn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib= /hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey= -guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-ya= rn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8= .jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/common= s-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-y= arn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar= :/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/pa= ranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/= hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoo= p-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop= -yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.= 0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoo= p-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-n= odemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/li= b/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoo= p-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn= -applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//h= adoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hado= op-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoo= p-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-serve= r-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-= 2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-= cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-c= dh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-= 0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20= -mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspec= tjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/us= r/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoo= p-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm= -3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.2= 0-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons= -logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-a= pi-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/had= oop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/us= r/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0= .20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs= -1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/h= adoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/= lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspect= jtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/us= r/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-m= apreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/c= ommons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.= jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/us= r/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-= 0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activat= ion-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapre= duce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/= lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.= jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0= .20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/m= ockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-ma= preduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paraname= r-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapr= educe/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-f= airscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoo= p-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-= 2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0= .0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/l= ib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/had= oop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//had= oop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.= 1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-example= s.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:= /usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/= lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapre= duce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/= .//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//had= oop-core.jar >>>>>>> 2014-07-31 18:07:51,793 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client >>>>>>> environment:java.library.path=3D//usr/lib/hadoop/lib/native >>>>>>> 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir= =3D/tmp >>>>>>> 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler= =3D >>>>>>> 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=3DLinux >>>>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=3Damd64 >>>>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client >>>>>>> environment:os.version=3D2.6.32-358.el6.x86_64 >>>>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=3Droo= t >>>>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=3D/ro= ot >>>>>>> 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper >>>>>>> (Environment.java:logEnv(100)) - Client >>>>>>> environment:user.dir=3D/etc/hbase/conf.golden_apple >>>>>>> 2014-07-31 18:07:51,813 INFO [main] zookeeper.ZooKeeper >>>>>>> (ZooKeeper.java:(433)) - Initiating client connection, >>>>>>> connectString=3Drhel1.local:2181,rhel6.local:2181,rhel2.local:2181 >>>>>>> sessionTimeout=3D5000 watcher=3Dnull >>>>>>> 2014-07-31 18:07:51,833 INFO [main-SendThread(rhel1.local:2181)] >>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Openi= ng >>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not >>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>> 2014-07-31 18:07:51,844 INFO [main-SendThread(rhel1.local:2181)] >>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socke= t >>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating >>>>>>> session >>>>>>> 2014-07-31 18:07:51,852 INFO [main-SendThread(rhel1.local:2181)] >>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session >>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181, >>>>>>> sessionid =3D 0x1478902fddc000a, negotiated timeout =3D 5000 >>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists. >>>>>>> Are you sure you want to clear all failover information from >>>>>>> ZooKeeper? >>>>>>> WARNING: Before proceeding, ensure that all HDFS services and >>>>>>> failover controllers are stopped! >>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31 >>>>>>> 18:07:51,858 INFO [main-EventThread] ha.ActiveStandbyElector >>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connec= ted. >>>>>>> Y >>>>>>> 2014-07-31 18:08:00,439 INFO [main] ha.ActiveStandbyElector >>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively del= eting >>>>>>> /hadoop-ha/golden-apple from ZK... >>>>>>> 2014-07-31 18:08:00,506 INFO [main] ha.ActiveStandbyElector >>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully de= leted >>>>>>> /hadoop-ha/golden-apple from ZK. >>>>>>> 2014-07-31 18:08:00,541 INFO [main] ha.ActiveStandbyElector >>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully c= reated >>>>>>> /hadoop-ha/golden-apple in ZK. >>>>>>> 2014-07-31 18:08:00,545 INFO [main-EventThread] >>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut = down >>>>>>> 2014-07-31 18:08:00,545 INFO [main] zookeeper.ZooKeeper >>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman >>>>>>> wrote: >>>>>>> >>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO. >>>>>>>> >>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams < >>>>>>>> discord@uw.edu> wrote: >>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help >>>>>>>> earlier today. >>>>>>>> > Just thought I'd forward this info regarding swapping out the >>>>>>>> NameNode in a >>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit >>>>>>>> Seattle, feel >>>>>>>> > free to give me a shout out. >>>>>>>> > >>>>>>>> > ---------- Forwarded message ---------- >>>>>>>> > From: Colin Kincaid Williams >>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM >>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a >>>>>>>> QJM / HA >>>>>>>> > configuration >>>>>>>> > To: user@hadoop.apache.org >>>>>>>> > >>>>>>>> > >>>>>>>> > Hi Jing, >>>>>>>> > >>>>>>>> > Thanks for the response. I will try this out, and file an Apache >>>>>>>> jira. >>>>>>>> > >>>>>>>> > Best, >>>>>>>> > >>>>>>>> > Colin Williams >>>>>>>> > >>>>>>>> > >>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao >>>>>>>> wrote: >>>>>>>> >> >>>>>>>> >> Hi Colin, >>>>>>>> >> >>>>>>>> >> I guess currently we may have to restart almost all the >>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN): >>>>>>>> >> >>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN >>>>>>>> since in >>>>>>>> >> the current implementation the SBN tries to send rollEditLog RP= C >>>>>>>> request to >>>>>>>> >> ANN periodically (thus if a NN failover happens later, the >>>>>>>> original ANN >>>>>>>> >> needs to send this RPC to the correct NN). >>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment >>>>>>>> for NN. >>>>>>>> >> Look at the code in BPOfferService: >>>>>>>> >> >>>>>>>> >> void refreshNNList(ArrayList addrs) throws >>>>>>>> >> IOException { >>>>>>>> >> Set oldAddrs =3D Sets.newHashSet(); >>>>>>>> >> for (BPServiceActor actor : bpServices) { >>>>>>>> >> oldAddrs.add(actor.getNNSocketAddress()); >>>>>>>> >> } >>>>>>>> >> Set newAddrs =3D Sets.newHashSet(addrs); >>>>>>>> >> >>>>>>>> >> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()= ) >>>>>>>> { >>>>>>>> >> // Keep things simple for now -- we can implement this at >>>>>>>> a later >>>>>>>> >> date. >>>>>>>> >> throw new IOException( >>>>>>>> >> "HA does not currently support adding a new standby t= o >>>>>>>> a running >>>>>>>> >> DN. " + >>>>>>>> >> "Please do a rolling restart of DNs to reconfigure th= e >>>>>>>> list of >>>>>>>> >> NNs."); >>>>>>>> >> } >>>>>>>> >> } >>>>>>>> >> >>>>>>>> >> 3. If you're using automatic failover, you also need to update >>>>>>>> the >>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKF= C >>>>>>>> will do >>>>>>>> >> gracefully fencing by sending RPC to the other NN. >>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the ne= w >>>>>>>> SBN but I >>>>>>>> >> have not tried before. >>>>>>>> >> >>>>>>>> >> Thus in general we may still have to restart all the >>>>>>>> services (except >>>>>>>> >> JNs) and update their configurations. But this may be a rolling >>>>>>>> restart >>>>>>>> >> process I guess: >>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the >>>>>>>> new SBN. >>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rollin= g >>>>>>>> restart >>>>>>>> >> of all the DN to update their configurations >>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and >>>>>>>> update their >>>>>>>> >> configuration. The new SBN should become active. >>>>>>>> >> >>>>>>>> >> I have not tried the upper steps, thus please let me know i= f >>>>>>>> this >>>>>>>> >> works or not. And I think we should also document the correct >>>>>>>> steps in >>>>>>>> >> Apache. Could you please file an Apache jira? >>>>>>>> >> >>>>>>>> >> Thanks, >>>>>>>> >> -Jing >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams < >>>>>>>> discord@uw.edu> >>>>>>>> >> wrote: >>>>>>>> >>> >>>>>>>> >>> Hello, >>>>>>>> >>> >>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA >>>>>>>> configuration. I >>>>>>>> >>> believe the steps to achieve this would be something similar t= o: >>>>>>>> >>> >>>>>>>> >>> Use the Bootstrap standby command to prep the replacment >>>>>>>> standby. Or >>>>>>>> >>> rsync if the command fails. >>>>>>>> >>> >>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat / >>>>>>>> journal to the >>>>>>>> >>> new standby >>>>>>>> >>> >>>>>>>> >>> Update the xml configuration on all nodes to reflect the >>>>>>>> replacment >>>>>>>> >>> standby. >>>>>>>> >>> >>>>>>>> >>> Start the replacment standby >>>>>>>> >>> >>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new >>>>>>>> NameNode >>>>>>>> >>> configuration. >>>>>>>> >>> >>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am >>>>>>>> going about >>>>>>>> >>> this the right way. Can anybody give me some suggestions here? >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> Regards, >>>>>>>> >>> >>>>>>>> >>> Colin Williams >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> CONFIDENTIALITY NOTICE >>>>>>>> >> NOTICE: This message is intended for the use of the individual >>>>>>>> or entity >>>>>>>> >> to which it is addressed and may contain information that is >>>>>>>> confidential, >>>>>>>> >> privileged and exempt from disclosure under applicable law. If >>>>>>>> the reader of >>>>>>>> >> this message is not the intended recipient, you are hereby >>>>>>>> notified that any >>>>>>>> >> printing, copying, dissemination, distribution, disclosure or >>>>>>>> forwarding of >>>>>>>> >> this communication is strictly prohibited. If you have received >>>>>>>> this >>>>>>>> >> communication in error, please contact the sender immediately >>>>>>>> and delete it >>>>>>>> >> from your system. Thank You. >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> --089e011765b9e1da8104ff922d25 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
No worries! =C2=A0Glad you had a test environment to play = with this in. =C2=A0Also, above I meant "If bootstrap fails...", = not format of course :)


On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams &= lt;discord@uw.edu&g= t; wrote:

I realize that this was a foolish error made late in the day= . I am no hadoop expert,=C2=A0 and have much to learn. This is why I setup = a test environment.

On Aug 1, 2014 6:47 AM, "Bryan Beaudreault&= quot; <bbe= audreault@hubspot.com> wrote:
Also you shouldn't format the new standby. You only format a namenode f= or a brand new cluster. Once a cluster is live you should just use the boot= strap on the new namenodes and never format again. Bootstrap is basically a= special format that just creates the dirs and copies an active fsimage to = the host.=C2=A0

If format fails (it's buggy imo) just rsync from the act= ive namenode. It will catch up by replaying the edits from the QJM when it = is started.=C2=A0

On Friday, August 1, 2014, Bryan= Beaudreault <bbeaudreault@hubspot.com> wrote:
You should first replace the namenode, then = when that is completely finished move on to replacing any journal nodes. Th= at part is easy:

1) bootstrap new JN (rsync from an existing)
2) Start new JN
3) push hdfs-site.xml to both namenodes
4) restart standby namenode
5) verify logs and admin ui show new= JN
6) restart active namenode
7) verify both namenodes= (failover should have happened and old standby should be writing to the ne= w JN)

You can remove an existing JN at the same time if you w= ant, just be careful to preserve the majority of the quorum during the whol= e operation (I.e only replace 1 at a time).=C2=A0

Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced = journalnode. IIRC there is a JIRA open about rolling restarting journal nod= es not being safe unless you roll edits. So that would go for replacing too= .=C2=A0

On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu= > wrote:

I will run through the procedure again tomorrow. It was late= in the day before I had a chance to test the procedure.=C2=A0

If I recall correctly I had an issue formatting the New stan= dby, before bootstrapping.=C2=A0 I think either at that point, or during th= e Zookeeper format command,=C2=A0 I was queried=C2=A0 to format the journal= to the 3 hosts in the quorum.I was unable to proceed without exception unl= ess choosing this option .

Are there any concerns adding another journal node to the ne= w standby?

On Jul 31, 2014 9:44 PM, "Bryan Beaudreault= " <bbeaudreault@hubspot.com> wrote:
This shouldn't have affected the journalnodes at all -= - they are mostly unaware of the zkfc and active/standby state. =C2=A0Did y= ou do something else that may have impacted the journalnodes? (i.e. shut do= wn 1 or more of them, or something else)

For your previous 2 emails, reporting errors/warns when doin= g -formatZK:

The WARN is fine. =C2=A0It's true= that you could get in a weird state if you had multiple namenodes up. =C2= =A0But with just 1 namenode up, you should be safe. What you are trying to = avoid is a split brain or standby/standby state, but that is impossible wit= h just 1 namenode alive. =C2=A0Similarly, the ERROR is a sanity check to ma= ke sure you don't screw yourself by formatting a running cluster. =C2= =A0I should have mentioned you need to use the -force argument to get aroun= d that.


On Fri,= Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <d= iscord@uw.edu> wrote:
However continuing with the= process my QJM eventually error'd out and my Active NameNode went down= .

2014-07-31 20:59:33,944 WARN =C2=A0[Logger channel to r= hel6.local/10.120.5.= 247:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357))= - Remote journal 10= .120.5.247:8485 failed to write txns 9635-9635. Will try to write to th= is JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server.Journ= al.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs= .qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjournal.server.Jo= urnalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoop.hdfs= .qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJourna= lProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalP= rotocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocol= Protos.java:14018)
at org.apache.hadoop.ipc.= ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453= )
at org.apache.hado= op.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.jav= a:1689)
at java.security.AccessCo= ntroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.secu= rity.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$= Handler.run(Server.java:1687)

at org.apa= che.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker= .invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.= journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoop.hdfs= .qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.hadoop.hdfs.q= journal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.F= utureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP= oolExecutor.java:1145)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:7= 45)
2014-07-31 20:59:33,954 WARN =C2=A0[Logger channel to rhel1.local/10.120.5.203:8485] c= lient.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journ= al 10.120.5.203:8485= failed to write txns 9635-9635. Will try to write to this JN again aft= er the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server.Journ= al.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs= .qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjournal.server.Jo= urnalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoop.hdfs= .qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJourna= lProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalP= rotocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocol= Protos.java:14018)
at org.apache.hadoop.ipc.= ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453= )
at org.apache.hado= op.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.jav= a:1689)
at java.security.AccessCo= ntroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.secu= rity.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$= Handler.run(Server.java:1687)

at org.apa= che.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker= .invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.= journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoop.hdfs= .qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.hadoop.hdfs.q= journal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.F= utureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP= oolExecutor.java:1145)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:7= 45)
2014-07-31 20:59:33,975 WARN =C2=A0[Logger channel to rhel2.local/10.120.5.25:8485] cli= ent.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal= 10.120.5.25:8485= failed to write txns 9635-9635. Will try to write to this JN again after t= he next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs.qjournal.server.Journ= al.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs= .qjournal.server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs.qjournal.server.Jo= urnalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at org.apache.hadoop.hdfs= .qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJourna= lProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalP= rotocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocol= Protos.java:14018)
at org.apache.hadoop.ipc.= ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453= )
at org.apache.hado= op.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.jav= a:1689)
at java.security.AccessCo= ntroller.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.secu= rity.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$= Handler.run(Server.java:1687)

at org.apa= che.hadoop.ipc.Client.call(Client.java:1224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker= .invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.= journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslato= rPB.journal(QJournalProtocolTranslatorPB.java:156)
at org.apache.hadoop.hdfs= .qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at org.apache.hadoop.hdfs.q= journal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.F= utureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP= oolExecutor.java:1145)
at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:7= 45)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] namenode.= FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) - Error: flush = failed for required journal (JournalAndStream(mgr=3DQJM to [10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485], strea= m=3DQuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many e= xceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC&#= 39;s epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs= .qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal= .server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs= .qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142= )
at org.apache.hado= op.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(= QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs= .qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlo= ckingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Serve= r$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs= .qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal= .server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs= .qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142= )
at org.apache.hado= op.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(= QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs= .qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlo= ckingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Serve= r$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's = epoch 5 is not the current writer epoch =C2=A00
at org.apache.hadoop.hdfs= .qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal= .server.Journal.journal(Journal.java:331)
at org.apache.hadoop.hdfs= .qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142= )
at org.apache.hado= op.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(= QJournalProtocolServerSideTranslatorPB.java:132)
at org.apache.hadoop.hdfs= .qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlo= ckingMethod(QJournalProtocolProtos.java:14018)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Serve= r$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)

at org.apache.hadoop.hdfs.qjournal.client= .QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs= .qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at org.apache.hadoop.hdfs.qjo= urnal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at org.apache.hadoop.hdfs= .qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:10= 7)
at org.apache.had= oop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java= :113)
at org.apache.hadoop.hdfs= .server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.h= dfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.ja= va:490)
at org.apache.hadoop.hdfs= .server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)=
at org.apache.hadoo= p.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs= .server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:48= 6)
at org.apache.had= oop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
at org.apache.hadoop.hdfs= .server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at org.apache.hadoop.hdfs.s= erver.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at org.apache.hadoop.hdfs= .server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at org.apache.hadoop.hdfs.server.nameno= de.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at org.apache.hadoop.hdfs= .server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)<= /div>
at org.apache.hadoop= .hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(Namenod= eProtocolServerSideTranslatorPB.java:129)
at org.apache.hadoop.hdfs= .protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlocki= ngMethod(NamenodeProtocolProtos.java:8762)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$Pr= otoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.= RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.= Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Me= thod)
at javax.security.auth.Su= bject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupI= nformation.java:1332)
at org.apache.hadoop.ipc.= Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,976 WARN= =C2=A0[IPC Server handler 5 on 8020] client.QuorumJournalManager (QuorumOu= tputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 9= 634
2014-07-31 20:59:33,978 INFO =C2=A0[IPC Server handler 5 on 8020] util= .ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
2= 014-07-31 20:59:33,982 INFO =C2=A0[Thread-0] namenode.NameNode (StringUtils= .java:run(615)) - SHUTDOWN_MSG:=C2=A0



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Will= iams <discord@uw.edu> wrote:
I tried a third time and it= just worked?

sudo hdfs zkfc -formatZK
20= 14-07-31 18:07:51,595 INFO =C2=A0[main] tools.DFSZKFailoverController (DFSZ= KFailoverController.java:<init>(140)) - Failover controller configure= d for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:zookeeper.version=3D3.4.3-cdh4.= 1.3--1, built on 01/27/2013 00:13 GMT
2014-07-31 18:07:51,791 INF= O =C2=A0[main] zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client = environment:host.name=3D= rhel1.local
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.version=3D1.7.0_60
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environme= nt.java:logEnv(100)) - Client environment:java.vendor=3DOracle Corporation<= /div>
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.home=3D/usr/java/jdk1.7.0_= 60/jre
2014-07-31 18:07:51,792 INFO =C2=A0[main] zookeeper.ZooKee= per (Environment.java:logEnv(100)) - Client environment:java.class.path=3D/= etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib= /commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/had= oop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/u= sr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hado= op/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/= lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/li= b/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.ja= r:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jac= kson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoo= p/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:= /usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.clouder= a.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jaspe= r-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/= hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib= /hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.= 6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-x= c-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/l= ib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/li= b/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-h= ttpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/li= b/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.= 7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy= -java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/s= tax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/= lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/had= oop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/h= adoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:= /usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-cod= ec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/li= b/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1= .3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/h= adoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1= .3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/= hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.= 2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop= -hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/ha= doop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.5= 2.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-= hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.ja= r:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookee= per-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/li= b/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1= .8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/c= ommons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hado= op-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtim= e-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr= /lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf= -java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/ha= doop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-= hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/= lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-ma= pper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yar= n/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8= .jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib= /log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/l= ib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0= .jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/a= opalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/had= oop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet= -3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/li= b/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications= -distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.= 3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/= hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.= //hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib= /hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//= hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tes= ts-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar= :/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cd= h4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.= 1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tes= ts.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/= lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/./= /hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn= /.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//ha= doop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//had= oop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-= mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-co= ntrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-map= reduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson= -mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/u= sr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce= /lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20= -mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0= .52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr= /lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jac= kson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.2= 0-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.= 7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar= :/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/had= oop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapredu= ce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/= lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-m= apreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson= -xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:= /usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop= -0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons= -httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/l= ib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapred= uce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib= /commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lan= g-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2= .jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/ha= doop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/= stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:= /usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.= 20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-a= pi-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduc= e/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-js= on-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/li= b/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/us= r/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/= lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/had= oop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.2= 0-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.= 0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/= usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapr= educe/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/= .//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/./= /hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoo= p-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-exa= mples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools= .jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/= usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.library.path=3D//usr/lib/h= adoop/lib/native
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookee= per.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.t= mpdir=3D/tmp
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:java.compiler=3D<NA>
2014-07-31 18:07:51,801 INFO =C2=A0[main] zookeeper.ZooKeeper (Enviro= nment.java:logEnv(100)) - Client environment:os.name=3DLinux
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:os.arch=3Damd64
2014-= 07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environment.java:= logEnv(100)) - Client environment:os.version=3D2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:user.name=3Droot
2014-07-31 18:07:51,802 INFO= =C2=A0[main] zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client e= nvironment:user.home=3D/root
2014-07-31 18:07:51,802 INFO =C2=A0[main] zookeeper.ZooKeeper (Environ= ment.java:logEnv(100)) - Client environment:user.dir=3D/etc/hbase/conf.gold= en_apple
2014-07-31 18:07:51,813 INFO =C2=A0[main] zookeeper.ZooK= eeper (ZooKeeper.java:<init>(433)) - Initiating client connection, co= nnectString=3Drhel1.local:2181,rhel6.local:2181,rhel2.local:2181 sessionTim= eout=3D5000 watcher=3Dnull
2014-07-31 18:07:51,833 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening sock= et connection to server rhel1.local/10.120.5.203:2181. Will not attempt to authenticate usi= ng SASL (unknown error)
2014-07-31 18:07:51,844 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket conne= ction established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO =C2=A0[main-SendThread(rhel1.local:2181)]= zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session establi= shment complete on server rhel1.local/10.120.5.203:2181, sessionid =3D 0x1478902fddc000a, n= egotiated timeout =3D 5000
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
The configured parent znode /hadoop-ha/golden-apple already exists.=
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Proceed formatting = /hadoop-ha/golden-apple? (Y or N) 2014-07-31 18:07:51,858 INFO =C2=A0[main-= EventThread] ha.ActiveStandbyElector (ActiveStandbyElector.java:processWatc= hEvent(538)) - Session connected.
Y=C2=A0
2014-07-31 18:08:00,439 INFO =C2=A0[main] ha.ActiveS= tandbyElector (ActiveStandbyElector.java:clearParentZNode(314)) - Recursive= ly deleting /hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:0= 0,506 INFO =C2=A0[main] ha.ActiveStandbyElector (ActiveStandbyElector.java:= clearParentZNode(327)) - Successfully deleted /hadoop-ha/golden-apple from = ZK.
2014-07-31 18:08:00,541 INFO =C2=A0[main] ha.ActiveStandbyElector (Act= iveStandbyElector.java:ensureParentZNode(299)) - Successfully created /hado= op-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO =C2=A0[mai= n-EventThread] zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThrea= d shut down
2014-07-31 18:08:00,545 INFO =C2=A0[main] zookeeper.ZooKeeper (ZooKeep= er.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix= 4e@gmail.com> wrote:
Cheers. That's rough. We don't have that problem h= ere at WanDISCO.

On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <discord@uw.= edu> wrote:
> Hi this is drocsid / discord from #hbase. Thanks for the help earlier = today.
> Just thought I'd forward this info regarding swapping out the Name= Node in a
> QJM / HA configuration. See you around on #hbase. If you visit Seattle= , feel
> free to give me a shout out.
>
> ---------- Forwarded message ----------
> From: Colin Kincaid Williams <discord@uw.edu>
> Date: Thu, Jul 31, 2014 at 12:35 PM
> Subject: Re: Juggling or swaping out the standby NameNode in a QJM / H= A
> configuration
> To: user@hadoop.apache.org
>
>
> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.=
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.co= m> wrote:
>>
>> Hi Colin,
>>
>> =C2=A0 =C2=A0 I guess currently we may have to restart almost all = the
>> daemons/services in order to swap out a standby NameNode (SBN): >>
>> 1. The current active NameNode (ANN) needs to know the new SBN sin= ce in
>> the current implementation the SBN tries to send rollEditLog RPC r= equest to
>> ANN periodically (thus if a NN failover happens later, the origina= l ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment fo= r NN.
>> Look at the code in BPOfferService:
>>
>> =C2=A0 void refreshNNList(ArrayList<InetSocketAddress> addrs= ) throws
>> IOException {
>> =C2=A0 =C2=A0 Set<InetSocketAddress> oldAddrs =3D Sets.newHa= shSet();
>> =C2=A0 =C2=A0 for (BPServiceActor actor : bpServices) {
>> =C2=A0 =C2=A0 =C2=A0 oldAddrs.add(actor.getNNSocketAddress());
>> =C2=A0 =C2=A0 }
>> =C2=A0 =C2=A0 Set<InetSocketAddress> newAddrs =3D Sets.newHa= shSet(addrs);
>>
>> =C2=A0 =C2=A0 if (!Sets.symmetricDifference(oldAddrs, newAddrs).is= Empty()) {
>> =C2=A0 =C2=A0 =C2=A0 // Keep things simple for now -- we can imple= ment this at a later
>> date.
>> =C2=A0 =C2=A0 =C2=A0 throw new IOException(
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "HA does not currently sup= port adding a new standby to a running
>> DN. " +
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Please do a rolling resta= rt of DNs to reconfigure the list of
>> NNs.");
>> =C2=A0 =C2=A0 }
>> =C2=A0 }
>>
>> 3. If you're using automatic failover, you also need to update= the
>> configuration of the ZKFC on the current ANN machine, since ZKFC w= ill do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new S= BN but I
>> have not tried before.
>>
>> =C2=A0 =C2=A0 Thus in general we may still have to restart all the= services (except
>> JNs) and update their configurations. But this may be a rolling re= start
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new = SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling r= estart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update = their
>> configuration. The new SBN should become active.
>>
>> =C2=A0 =C2=A0 I have not tried the upper steps, thus please let me= know if this
>> works or not. And I think we should also document the correct step= s in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <dis= cord@uw.edu>
>> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA co= nfiguration. I
>>> believe the steps to achieve this would be something similar t= o:
>>>
>>> Use the Bootstrap standby command to prep the replacment stand= by. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / jou= rnal to the
>>> new standby
>>>
>>> Update the xml configuration on all nodes to reflect the repla= cment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new Na= meNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am = going about
>>> this the right way. Can anybody give me some suggestions here?=
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or = entity
>> to which it is addressed and may contain information that is confi= dential,
>> privileged and exempt from disclosure under applicable law. If the= reader of
>> this message is not the intended recipient, you are hereby notifie= d that any
>> printing, copying, dissemination, distribution, disclosure or forw= arding of
>> this communication is strictly prohibited. If you have received th= is
>> communication in error, please contact the sender immediately and = delete it
>> from your system. Thank You.
>
>
>




--089e011765b9e1da8104ff922d25--