Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8675C18F41 for ; Thu, 31 Dec 2015 13:59:50 +0000 (UTC) Received: (qmail 77704 invoked by uid 500); 31 Dec 2015 13:59:50 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 77573 invoked by uid 500); 31 Dec 2015 13:59:50 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 77550 invoked by uid 99); 31 Dec 2015 13:59:50 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Dec 2015 13:59:50 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A3AA22C1F55 for ; Thu, 31 Dec 2015 13:59:49 +0000 (UTC) Date: Thu, 31 Dec 2015 13:59:49 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-15058) AssignmentManager should account for unsuccessful split correctly which initially passes quota check MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-15058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-15058: --------------------------- Description: When region split doesn't pass quota check, we would see exception similar to the following: {code} 2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585] regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2: testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.; Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster, zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. java.io.IOException: Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502) at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) {code} However, region split may fail for subsequent SplitTransactionPhase's in stepsBeforePONR(). Currently in branch-1, the distinction among the following states is not clear in AssignmentManager#onRegionTransition(): {code} case SPLIT_PONR: case SPLIT: case SPLIT_REVERTED: errorMsg = onRegionSplit(serverName, code, hri, HRegionInfo.convert(transition.getRegionInfo(1)), HRegionInfo.convert(transition.getRegionInfo(2))); if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) { try { regionStateListener.onRegionSplitReverted(hri); {code} onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is normally null (onRegionSplit returns null at the end). This would result in onRegionSplitReverted() being called for cases of SPLIT_PONR and SPLIT. When region split fails, AssignmentManager#onRegionTransition() should account for the failure properly so that quota bookkeeping is consistent. was: When region split doesn't pass quota check, we would see exception similar to the following: {code} 2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585] regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2: testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.; Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster, zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. java.io.IOException: Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502) at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) {code} However, region split may fail for subsequent SplitTransactionPhase's in stepsBeforePONR(). Currently there is no mechanism to rollback the update to namespace quota. When region split fails, NamespaceAuditor should account for the failure so that quota bookkeeping is consistent. > AssignmentManager should account for unsuccessful split correctly which initially passes quota check > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-15058 > URL: https://issues.apache.org/jira/browse/HBASE-15058 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > > When region split doesn't pass quota check, we would see exception similar to the following: > {code} > 2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585] regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2: testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.; Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster, zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. > java.io.IOException: Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. > at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345) > at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262) > at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502) > at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > {code} > However, region split may fail for subsequent SplitTransactionPhase's in stepsBeforePONR(). > Currently in branch-1, the distinction among the following states is not clear in AssignmentManager#onRegionTransition(): > {code} > case SPLIT_PONR: > case SPLIT: > case SPLIT_REVERTED: > errorMsg = > onRegionSplit(serverName, code, hri, HRegionInfo.convert(transition.getRegionInfo(1)), > HRegionInfo.convert(transition.getRegionInfo(2))); > if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) { > try { > regionStateListener.onRegionSplitReverted(hri); > {code} > onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is normally null (onRegionSplit returns null at the end). > This would result in onRegionSplitReverted() being called for cases of SPLIT_PONR and SPLIT. > When region split fails, AssignmentManager#onRegionTransition() should account for the failure properly so that quota bookkeeping is consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)