Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 457EA1726E for ; Thu, 30 Apr 2015 08:23:07 +0000 (UTC) Received: (qmail 73271 invoked by uid 500); 30 Apr 2015 08:23:06 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 73224 invoked by uid 500); 30 Apr 2015 08:23:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 73207 invoked by uid 99); 30 Apr 2015 08:23:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Apr 2015 08:23:06 +0000 Date: Thu, 30 Apr 2015 08:23:06 +0000 (UTC) From: "Matteo Bertozzi (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13217) Flush procedure fails in trunk due to ZK issue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521118#comment-14521118 ] Matteo Bertozzi commented on HBASE-13217: ----------------------------------------- {quote}Here is what the znode looks like when MASTER thinks the procedure is completed (obviously, myregionserver-5 is missing under 'reached' znode{quote} my first guess is that when the procedure started the regionserver-5 was in transition and not in the list. there should be a list of "online regionservers" that the procedure uses to setup the latch used to decide when everyone is completed. that procedure code doesn't tolerate region in transition, so that may be the problem you are seeing. (but may be something else, this is just my first guess without looking at the code) > Flush procedure fails in trunk due to ZK issue > ---------------------------------------------- > > Key: HBASE-13217 > URL: https://issues.apache.org/jira/browse/HBASE-13217 > Project: HBase > Issue Type: Bug > Reporter: ramkrishna.s.vasudevan > Assignee: Stephen Yuan Jiang > > When ever I try to flush explicitly in the trunk code the flush procedure fails due to ZK issue > {code} > ERROR: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via stobdtserver3,16040,1426172670959:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426172670959 > at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at org.apache.hadoop.hbase.procedure.Procedure.isCompleted(Procedure.java:368) > at org.apache.hadoop.hbase.procedure.flush.MasterFlushTableProcedureManager.isProcedureDone(MasterFlushTableProcedureManager.java:196) > at org.apache.hadoop.hbase.master.MasterRpcServices.isProcedureDone(MasterRpcServices.java:905) > at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:47019) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2073) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426172670959 > at org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:273) > at org.apache.hadoop.hbase.procedure.ProcedureMember.controllerConnectionFailure(ProcedureMember.java:225) > at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.sendMemberAcquired(ZKProcedureMemberRpcs.java:254) > at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:166) > at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ... 1 more > {code} > Once this occurs, even on restart of the RS the RS becomes unusable. I have verified that the ZK remains intact and there is no problem with it. a bit older version of trunk ( 3months) does not have this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)