Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF51110351 for ; Sat, 18 Jan 2014 03:06:28 +0000 (UTC) Received: (qmail 39727 invoked by uid 500); 18 Jan 2014 03:06:27 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 39693 invoked by uid 500); 18 Jan 2014 03:06:27 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 39683 invoked by uid 99); 18 Jan 2014 03:06:27 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Jan 2014 03:06:27 +0000 Date: Sat, 18 Jan 2014 03:06:26 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10370) Compaction in out-of-date Store causes region split failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875516#comment-13875516 ] Hudson commented on HBASE-10370: -------------------------------- SUCCESS: Integrated in HBase-0.98 #93 (See [https://builds.apache.org/job/HBase-0.98/93/]) HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559276) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559274) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java > Compaction in out-of-date Store causes region split failure > ----------------------------------------------------------- > > Key: HBASE-10370 > URL: https://issues.apache.org/jira/browse/HBASE-10370 > Project: HBase > Issue Type: Bug > Components: Compaction > Affects Versions: 0.94.3, 0.98.0, 0.99.0 > Reporter: Liu Shaohui > Assignee: Liu Shaohui > Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: 10370-v3.patch, 10370-v4.patch, 10370v2.096.txt, HBASE-10370-v1.diff, HBASE-10370-v2.diff > > > In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException. > {quote} > 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 > java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 > at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375) > at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467) > at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf > .... > {quote} > The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever. > The timeline is that > Assumption: there are two hfiles: a, b in Store A in Region R > t0: A compaction request of Store A(a+b) in Region R is sent. > t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b). > t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived. > t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b) > t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException. > I have add a test to identity this problem. > After search the jira, maybe HBASE-8502 is the same problem. [~goldin] -- This message was sent by Atlassian JIRA (v6.1.5#6160)