Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C76B100EF for ; Sat, 21 Mar 2015 07:08:39 +0000 (UTC) Received: (qmail 16222 invoked by uid 500); 21 Mar 2015 07:08:38 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 16116 invoked by uid 500); 21 Mar 2015 07:08:38 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 15775 invoked by uid 99); 21 Mar 2015 07:08:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 07:08:38 +0000 Date: Sat, 21 Mar 2015 07:08:38 +0000 (UTC) From: "zhangduo (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 zhangduo created HBASE-13308: -------------------------------- Summary: Fix flaky TestEndToEndSplitTransaction Key: HBASE-13308 URL: https://issues.apache.org/jira/browse/HBASE-13308 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:541770000000002 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:541770000000002 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0 {noformat} We can see that, the compaction was completed at 03:58:47,019, but split is started at 03:58:46,970 which is earlier. So we have a reference file and is not splittable. I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before the compaction complete. Will try to prepare a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)