Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D71CB10F73 for ; Mon, 10 Feb 2014 20:01:50 +0000 (UTC) Received: (qmail 87766 invoked by uid 500); 10 Feb 2014 20:01:39 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 87694 invoked by uid 500); 10 Feb 2014 20:01:37 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 87553 invoked by uid 99); 10 Feb 2014 20:01:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Feb 2014 20:01:25 +0000 Date: Mon, 10 Feb 2014 20:01:25 +0000 (UTC) From: "Jerry He (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896937#comment-13896937 ] Jerry He commented on HBASE-10492: ---------------------------------- The problem is probably caused by this part of the code in SplitTransaction.openDaughters() {code} // Open daughters in parallel. DaughterOpener aOpener = new DaughterOpener(server, a); DaughterOpener bOpener = new DaughterOpener(server, b); aOpener.start(); bOpener.start(); try { aOpener.join(); bOpener.join(); } {code} We are opening the daughter regions in separate new threads. It is possible, although rare, due to issues like thread scheduling, the daughter regions may not be opened until after a long time. > open daughter regions can unpredictably take long time > ------------------------------------------------------ > > Key: HBASE-10492 > URL: https://issues.apache.org/jira/browse/HBASE-10492 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.96.0 > Reporter: Jerry He > > I have seen during a stress testing client was getting > "RetriesExhaustedWithDetailsException: Failed 748 actions: NotServingRegionException" > On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN. > The corresponding time period on the region sever log is: > {code} > 2014-02-08 20:44:12,662 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 010c1981882d1a59201af5e2dc589d44 > 2014-02-08 20:44:12,666 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: c2eb9b7971ca7f3fed3da86df5b788e7 > {code} > There were no INFO related to these two regions until: (at the end see this: Split took 57mins, 16sec) > {code} > 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355 > 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354 > 2014-02-08 21:41:14,032 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. > 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. with server=hdtest208.svl.ibm.com,60020,1391887547473 > 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy task for tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. > 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. > 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c. into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute. > 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed compaction: Request = regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c., storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883, time=1391924373278861000; duration=1mins, 40sec > 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on cf in region tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. > 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Starting compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. into tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp, totalSize=709.7 M > 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. with server=hdtest208.svl.ibm.com,60020,1391887547473 > 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy task for tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. > 2014-02-08 21:41:14,190 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d., new regions: tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44., tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.. Split took 57mins, 16sec > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)