hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Kincaid Williams <disc...@uw.edu>
Subject Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration
Date Fri, 01 Aug 2014 15:53:59 GMT
The test environment is a 6 node virtualbox cluster run on 2 desktops :] 7
with the extra namenode.


On Fri, Aug 1, 2014 at 7:26 AM, Bryan Beaudreault <bbeaudreault@hubspot.com>
wrote:

> No worries!  Glad you had a test environment to play with this in.  Also,
> above I meant "If bootstrap fails...", not format of course :)
>
>
> On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>
>> I realize that this was a foolish error made late in the day. I am no
>> hadoop expert,  and have much to learn. This is why I setup a test
>> environment.
>> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bbeaudreault@hubspot.com>
>> wrote:
>>
>>> Also you shouldn't format the new standby. You only format a namenode
>>> for a brand new cluster. Once a cluster is live you should just use the
>>> bootstrap on the new namenodes and never format again. Bootstrap is
>>> basically a special format that just creates the dirs and copies an active
>>> fsimage to the host.
>>>
>>> If format fails (it's buggy imo) just rsync from the active namenode. It
>>> will catch up by replaying the edits from the QJM when it is started.
>>>
>>> On Friday, August 1, 2014, Bryan Beaudreault <bbeaudreault@hubspot.com>
>>> wrote:
>>>
>>>> You should first replace the namenode, then when that is completely
>>>> finished move on to replacing any journal nodes. That part is easy:
>>>>
>>>> 1) bootstrap new JN (rsync from an existing)
>>>> 2) Start new JN
>>>> 3) push hdfs-site.xml to both namenodes
>>>> 4) restart standby namenode
>>>> 5) verify logs and admin ui show new JN
>>>> 6) restart active namenode
>>>> 7) verify both namenodes (failover should have happened and old standby
>>>> should be writing to the new JN)
>>>>
>>>> You can remove an existing JN at the same time if you want, just be
>>>> careful to preserve the majority of the quorum during the whole operation
>>>> (I.e only replace 1 at a time).
>>>>
>>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>>> journal nodes not being safe unless you roll edits. So that would go for
>>>> replacing too.
>>>>
>>>> On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu>
>>>> wrote:
>>>>
>>>>> I will run through the procedure again tomorrow. It was late in the
>>>>> day before I had a chance to test the procedure.
>>>>>
>>>>> If I recall correctly I had an issue formatting the New standby,
>>>>> before bootstrapping.  I think either at that point, or during the
>>>>> Zookeeper format command,  I was queried  to format the journal to the 3
>>>>> hosts in the quorum.I was unable to proceed without exception unless
>>>>> choosing this option .
>>>>>
>>>>> Are there any concerns adding another journal node to the new standby?
>>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bbeaudreault@hubspot.com>
>>>>> wrote:
>>>>>
>>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>>> them, or something else)
>>>>>>
>>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>>> -formatZK:
>>>>>>
>>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>>> argument to get around that.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> However continuing with the process my QJM eventually error'd out
>>>>>>> and my Active NameNode went down.
>>>>>>>
>>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>>> QuorumOutputStream starting at txid 9634
>>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>>
>>>>>>>> I tried a third time and it just worked?
>>>>>>>>
>>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>>> GMT
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>>> =rhel1.local
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>>> Corporation
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will
>>>>>>>> not attempt to authenticate using SASL (unknown error)
>>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>>> connection established to rhel1.local/10.120.5.203:2181,
>>>>>>>> initiating session
>>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>>  ===============================================
>>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>>> Are you sure you want to clear all failover information from
>>>>>>>> ZooKeeper?
>>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>>> failover controllers are stopped!
>>>>>>>> ===============================================
>>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>>> Y
>>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix4e@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu> wrote:
>>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>>> earlier today.
>>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>>> NameNode in a
>>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>>> Seattle, feel
>>>>>>>>> > free to give me a shout out.
>>>>>>>>> >
>>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>>> > From: Colin Kincaid Williams <discord@uw.edu>
>>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>>> QJM / HA
>>>>>>>>> > configuration
>>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Hi Jing,
>>>>>>>>> >
>>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>>> jira.
>>>>>>>>> >
>>>>>>>>> > Best,
>>>>>>>>> >
>>>>>>>>> > Colin Williams
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <
>>>>>>>>> jing@hortonworks.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi Colin,
>>>>>>>>> >>
>>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>>> >>
>>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>>> since in
>>>>>>>>> >> the current implementation the SBN tries to send rollEditLog
>>>>>>>>> RPC request to
>>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>>> original ANN
>>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>>> for NN.
>>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>>> >>
>>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>>> >> IOException {
>>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>>> >>     }
>>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>>> >>
>>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs,
>>>>>>>>> newAddrs).isEmpty()) {
>>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>>> a later
>>>>>>>>> >> date.
>>>>>>>>> >>       throw new IOException(
>>>>>>>>> >>           "HA does not currently support adding a new standby
>>>>>>>>> to a running
>>>>>>>>> >> DN. " +
>>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure
>>>>>>>>> the list of
>>>>>>>>> >> NNs.");
>>>>>>>>> >>     }
>>>>>>>>> >>   }
>>>>>>>>> >>
>>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>>> the
>>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since
>>>>>>>>> ZKFC will do
>>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the
>>>>>>>>> new SBN but I
>>>>>>>>> >> have not tried before.
>>>>>>>>> >>
>>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>>> services (except
>>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>>> restart
>>>>>>>>> >> process I guess:
>>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>>> new SBN.
>>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a
>>>>>>>>> rolling restart
>>>>>>>>> >> of all the DN to update their configurations
>>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>>> update their
>>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>>> >>
>>>>>>>>> >>     I have not tried the upper steps, thus please let me know
>>>>>>>>> if this
>>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>>> steps in
>>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> -Jing
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hello,
>>>>>>>>> >>>
>>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>>> configuration. I
>>>>>>>>> >>> believe the steps to achieve this would be something similar
>>>>>>>>> to:
>>>>>>>>> >>>
>>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>>> standby. Or
>>>>>>>>> >>> rsync if the command fails.
>>>>>>>>> >>>
>>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>>> journal to the
>>>>>>>>> >>> new standby
>>>>>>>>> >>>
>>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>>> replacment
>>>>>>>>> >>> standby.
>>>>>>>>> >>>
>>>>>>>>> >>> Start the replacment standby
>>>>>>>>> >>>
>>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>>> NameNode
>>>>>>>>> >>> configuration.
>>>>>>>>> >>>
>>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>>> going about
>>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Regards,
>>>>>>>>> >>>
>>>>>>>>> >>> Colin Williams
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>>> or entity
>>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>>> confidential,
>>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>>> the reader of
>>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any
>>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>>> forwarding of
>>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>>> this
>>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>>> and delete it
>>>>>>>>> >> from your system. Thank You.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>

Mime
View raw message