Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4E366200D78 for ; Thu, 28 Dec 2017 06:16:13 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4C745160C32; Thu, 28 Dec 2017 05:16:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6BB83160C23 for ; Thu, 28 Dec 2017 06:16:12 +0100 (CET) Received: (qmail 38150 invoked by uid 500); 28 Dec 2017 05:16:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 38139 invoked by uid 99); 28 Dec 2017 05:16:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Dec 2017 05:16:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EED49C07A8 for ; Thu, 28 Dec 2017 05:16:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.211 X-Spam-Level: X-Spam-Status: No, score=-99.211 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BLZZ-0vQdcLp for ; Thu, 28 Dec 2017 05:16:10 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id C5CFD5F3CE for ; Thu, 28 Dec 2017 05:16:09 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 385DEE08C2 for ; Thu, 28 Dec 2017 05:16:09 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A6A80240F2 for ; Thu, 28 Dec 2017 05:16:03 +0000 (UTC) Date: Thu, 28 Dec 2017 05:16:03 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-19639) ITBLL can't go big because RegionTooBusyException... Above memstore limit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 28 Dec 2017 05:16:13 -0000 [ https://issues.apache.org/jira/browse/HBASE-19639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305069#comment-16305069 ] stack commented on HBASE-19639: ------------------------------- Again failed because a server took too long to recover. Looking at this change now: tree 9fe0b4e8c5a6b2072230ece1be781900396aa1bb parent 22b90c4a647d0ffeec7778042eedd0a49a664ed0 author Guanghao Zhang Tue Nov 28 21:08:19 2017 +0800 committer Michael Stack Wed Nov 29 10:33:20 2017 -0800 HBASE-19359 Revisit the default config of hbase client retries number Our MTTR should be faster but we seem to be giving up at 11th retry... most of the time, 11 retries is enough but sometimes it is not enough. > ITBLL can't go big because RegionTooBusyException... Above memstore limit > ------------------------------------------------------------------------- > > Key: HBASE-19639 > URL: https://issues.apache.org/jira/browse/HBASE-19639 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > > Running ITBLLs, the basic link generator keeps failing because I run into exceptions like below: > {code} > 2017-12-26 19:23:45,284 INFO [main] org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator: Persisting current.length=1000000, count=1000000, id=Job: job_1513025868268_0062 Task: attempt_1513025868268_0062_m_000006_2, current=\x8B\xDB25\xA7*\x9A\xF5\xDEx\x83\xDF\xDC?\x94\x92, i=1000000 > 2017-12-26 19:24:18,982 INFO [htable-pool3-t6] org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, table=IntegrationTestBigLinkedList, attempt=10/11 failed=524ops, last exception: org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b., server=ve0538.halxg.cloudera.com,16020,1514343549993, memstoreSize=538084641, blockingMemStoreSize=536870912 > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587) > at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, retrying after=10050ms, replay=524ops > 2017-12-26 19:24:29,061 INFO [htable-pool3-t6] org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, table=IntegrationTestBigLinkedList, attempt=11/11 failed=524ops, last exception: org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b., server=ve0538.halxg.cloudera.com,16020,1514343549993, memstoreSize=538084641, blockingMemStoreSize=536870912 > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587) > at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, retrying after=10033ms, replay=524ops > 2017-12-26 19:24:37,183 INFO [ReadOnlyZKClient] org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x015051a0 no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests. > 2017-12-26 19:24:39,122 WARN [htable-pool3-t6] org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, table=IntegrationTestBigLinkedList, attempt=12/11 failed=524ops, last exception: org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b., server=ve0538.halxg.cloudera.com,16020,1514343549993, memstoreSize=538084641, blockingMemStoreSize=536870912 > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587) > at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > ... > {code} > Fails task over and over. With server-killing monkeys. > 24Gs which should be more than enough. > Had just finished a big compaction. > Whats shutting us out? Why taking so long to flush? We seen stuck at limit so job fails. -- This message was sent by Atlassian JIRA (v6.4.14#64029)