hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: hadoop hdfs ha QJM namenode failover causes dead regionservers (IOE in log roller) , master server abort, and needed hbck -fixAssignments
Date Tue, 15 Dec 2015 11:32:19 GMT
Hi Solin,

The timeout messages are usually a consequence of other issues on the connectivity between
the Namenode and the QJM. Assuming Regionservers are configured properly to HDFS HA, pointing
to an HDFS nameservice instead of a direct namenode address, it should also be resilient to
a failover. 

Considering the the Zookeeper session timeout message on the Regionserver log below, I would
look first for a NW issue on the cluster, but it's just an initial guess:

…
> 2015-12-09 04:11:35,413 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x44e6c2f20980003 has expired,
> closing socket connection
...



> On 15 Dec 2015, at 01:17, Colin Kincaid Williams <discord@uw.edu> wrote:
> 
> We had a namenode go down due to timeout with the hdfs ha qjm journal:
> 
> 
> 
> 2015-12-09 04:10:42,723 WARN
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016
> ms (timeout=20000 ms) for a response for sendEdits
> 
> 2015-12-09 04:10:43,708 FATAL
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for
> required journal (JournalAndStream(mgr=QJM to [10.42.28.221:8485,
> 10.42.28.222:8485, 10.42.28.223:8485], stream=QuorumOutputStream starting
> at txid 8781293))
> 
> java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to
> respond.
> 
> at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
> 
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1695)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1669)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:409)
> 
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
> 
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
> 
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> 
> at java.security.AccessController.doPrivileged(Native Method)
> 
> at javax.security.auth.Subject.doAs(Subject.java:415)
> 
> 
> While this is disturbing in it's own right, I'm further annoyed that HBASE
> shut down  2 region servers. Furthermore, we had to hbck -fixAssignments to
> repair HBASE, and I'm not sure that the data from the shutdown regions was
> available, and if our hbase service itself was available afterwards:
> 
> 
> 2015-12-09 04:10:44,320 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server ^@^@hbase008r09.comp.prod.local,60020,1436412712133 reported
> a fatal error:
> 
> ABORTING region server hbase008r09.comp.prod.local,60020,1436412712133: IOE
> in log roller
> 
> Cause:
> 
> java.io.IOException: cannot get log writer
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
> 
>  at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
> 
>  at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> 
>  at java.lang.Thread.run(Thread.java:722)
> 
> Caused by: java.io.IOException: java.io.IOException: Failed on local
> exception: java.io.IOException: Response is null.; Host Details : local
> host is: "hbase008r09.comp.prod.local/10.42.28.192"; destination host is:
> "hbasenn001.comp.prod.local":8020;
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
> 
>  ... 4 more
> 
> Caused by: java.io.IOException: Failed on local exception:
> java.io.IOException: Response is null.; Host Details : local host is:
> "hbase008r09.comp.prod.local/10.42.28.192"; destination host is:
> "hbasenn001.comp.prod.local":8020;
> 
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
> 
>  at org.apache.hadoop.ipc.Client.call(Client.java:1228)
> 
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> 
>  at com.sun.proxy.$Proxy14.create(Unknown Source)
> 
>  at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
> 
>  at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
> 
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>  at java.lang.reflect.Method.invoke(Method.java:601)
> 
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 
>  at com.sun.proxy.$Proxy15.create(Unknown Source)
> 
>  at
> org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
> 
>  at
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
> 
>  at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)
> 
>  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)
> 
>  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)
> 
>  at
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)
> 
>  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)
> 
>  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)
> 
>  at
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> 
>  at org.apache.hadoop.fs.FileContext.create(FileContext.java:660)
> 
>  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)
> 
>  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
> 
>  at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> 
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>  at java.lang.reflect.Method.invoke(Method.java:601)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
> 
>  ... 5 more
> 
> Caused by: java.io.IOException: Response is null.
> 
>  at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)
> 
>  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)
> 
> 
> 2015-12-09 04:10:44,387 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server ^@^@hbase007r08.comp.prod.local,60020,1436412674179 reported
> a fatal error:
> 
> ABORTING region server hbase007r08.comp.prod.local,60020,1436412674179: IOE
> in log roller
> 
> Cause:
> 
> java.io.IOException: cannot get log writer
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
> 
>  at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
> 
>  at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> 
>  at java.lang.Thread.run(Thread.java:722)
> 
> Caused by: java.io.IOException: java.io.IOException: Failed on local
> exception: java.io.IOException: Response is null.; Host Details : local
> host is: "hbase007r08.comp.prod.local/10.42.28.191"; destination host is:
> "hbasenn001.comp.prod.local":8020;
> 
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
> 
>  ... 4 more
> 
> Caused by: java.io.IOException: Failed on local exception:
> java.io.IOException: Response is null.; Host Details : local host is:
> "hbase007r08.comp.prod.local/10.42.28.191"; destination host is:
> "hbasenn001.comp.prod.local":8020;
> 
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
> 
>  at org.apache.hadoop.ipc.Client.call(Client.java:1228)
> 
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> 
>  at com.sun.proxy.$Proxy14.create(Unknown Source)
> 
>  at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
> 
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> 
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>  at java.lang.reflect.Method.invoke(Method.java:601)
> 
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 
>  at com.sun.proxy.$Proxy15.create(Unknown Source)
> 
>  at
> org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
> 
>  at
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
> 
>  at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)
> 
>  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)
> 
>  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)
> 
>  at
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)
> 
>  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)
> 
>  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)
> 
>  at
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> 
>  at org.apache.hadoop.fs.FileContext.create(FileContext.java:660)
> 
>  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)
> 
>  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
> 
>  at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> 
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>  at java.lang.reflect.Method.invoke(Method.java:601)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
> 
>  ... 5 more
> 
> Caused by: java.io.IOException: Response is null.
> 
>  at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)
> 
>  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)
> 
> 
> 2015-12-09 04:11:01,444 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 26679ms for sessionid
> 0x44e6c2f20980003, closing socket connection and attempting reconnect
> 
> 2015-12-09 04:11:34,636 WARN
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking
> getListing of class ClientNamenodeProtocolTranslatorPB. Trying to fail over
> immediately.
> 
> 2015-12-09 04:11:34,687 WARN
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking
> getListing of class ClientNamenodeProtocolTranslatorPB after 1 fail over
> attempts. Trying to fail over after sleeping for 791ms.
> 
> 2015-12-09 04:11:35,334 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":50237,"call":"reportRSFatalError([B@3c97e50c, ABORTING
> region server hbase008r09.comp.prod.local,60020,1436412712133: IOE in log
> roller\nCause:\njava.io.IOException: cannot get log writer\n\tat
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)\n\tat
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)\n\tat
> org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)\n\tat
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)\n\tat
> java.lang.Thread.run(Thread.java:722)\nCaused by: java.io.IOException:
> java.io.IOException: Failed on local exception: java.io.IOException:
> Response is null.; Host Details : local host is:
> \"hbase008r09.comp.prod.local/10.42.28.192\"; destination host is:
> \"hbasenn001.comp.prod.local\":8020; \n\tat
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)\n\tat
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)\n\t...
> 4 more\nCaused by: java.io.IOException: Failed on local exception:
> java.io.IOException: Response is null.; Host Details : local host is:
> \"hbase008r09.comp.prod.local/10.42.28.192\"; destination host is:
> \"hbasenn001.comp.prod.local\":8020; \n\tat
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)\n\tat
> org.apache.hadoop.ipc.Client.call(Client.java:1228)\n\tat
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)\n\tat
> com.sun.proxy.$Proxy14.create(Unknown Source)\n\tat
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)\n\tat
> sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)\n\tat
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
> java.lang.reflect.Method.invoke(Method.java:601)\n\tat
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)\n\tat
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)\n\tat
> com.sun.proxy.$Proxy15.create(Unknown Source)\n\tat
> org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)\n\tat
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)\n\tat
> org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)\n\tat
> org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)\n\tat
> org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)\n\tat
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)\n\tat
> org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)\n\tat
> org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)\n\tat
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)\n\tat
> org.apache.hadoop.fs.FileContext.create(FileContext.java:660)\n\tat
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)\n\tat
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)\n\tat
> sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)\n\tat
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
> java.lang.reflect.Method.invoke(Method.java:601)\n\tat
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)\n\t...
> 5 more\nCaused by: java.io.IOException: Response is null.\n\tat
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)\n\tat
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)\n), rpc
> version=1, client version=29, methodsFingerPrint=-525182806","client":"
> 10.42.28.192:52162
> ","starttimems":1449659444320,"queuetimems":0,"class":"HMaster","responsesize":0,"method":"reportRSFatalError"}
> 
> 2015-12-09 04:11:35,409 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hbase004r08.comp.prod.local/10.42.28.188:2181.
> Will not attempt to authenticate using SASL (Unable to locate a login
> configuration)
> 
> 2015-12-09 04:11:35,411 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hbase004r08.comp.prod.local/10.42.28.188:2181,
> initiating session
> 
> 2015-12-09 04:11:35,413 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x44e6c2f20980003 has expired,
> closing socket connection
> 
> 2015-12-09 04:11:35,413 FATAL org.apache.hadoop.hbase.master.HMaster:
> Master server abort: loaded coprocessors are: []
> 
> 2015-12-09 04:11:35,414 INFO org.apache.hadoop.hbase.master.HMaster:
> Primary Master trying to recover from ZooKeeper session expiry.
> 
> 2015-12-09 04:11:35,416 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
> connectString=hbase004r08.comp.prod.local:2181,hbase003r07.comp.prod.local:2181,hbase005r09.comp.prod.local:2181
> sessionTimeout=1200000 watcher=master:60000
> 
> 
> ...
> 
> 
> and eventually:
> 
> 
> 2015-12-09 04:11:46,724 ERROR org.apache.zookeeper.ClientCnxn: Caught
> unexpected throwable
> 
> 2015-12-09 04:11:46,724 ERROR org.apache.zookeeper.ClientCnxn: Caught
> unexpected throwable
> 
> java.lang.StackOverflowError
> 
>  at java.security.AccessController.doPrivileged(Native Method)
> 
>  at java.io.PrintWriter.<init>(PrintWriter.java:78)
> 
>  at java.io.PrintWriter.<init>(PrintWriter.java:62)
> 
>  at
> org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58)
> 
>  at
> org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
> 
>  at
> org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
> 
>  at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
> 
>  at
> org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
> 
>  at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
> 
>  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
> 
>  at
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
> 
>  at org.apache.log4j.Category.callAppenders(Category.java:206)
> 
>  at org.apache.log4j.Category.forcedLog(Category.java:391)
> 
>  at org.apache.log4j.Category.log(Category.java:856)
> 
>  at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:623)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
> 
>  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
> 
>  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
> 
>  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)
> 
>  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
> 
>  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
> 
>  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
> 
>  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)
> 
>  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
> 
>  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
> 
>  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
> 
>  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)
> 
>  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
> 
>  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
> 
>  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
> 
>  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)
> 
>  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
> 
>  at
> org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
> 
>  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
> 
>  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
> 
>  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)
> 
>  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)
> 
>  at
> org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)
> 
> 
> ...
> 
> 
> Since the namenode failover made the other nameserver active, then why did
> my region servers decide to shutdown? The HDFS service seems to have stayed
> up. Then how can I make the HBASE service more resilient to namenode
> failovers?
> 
> 
> Hbase: Version 0.92.1-cdh4.1.3
> 
> 
> Hadoop: Hadoop 2.0.0-cdh4.1.3


Mime
View raw message