Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E3C73200D52 for ; Sat, 18 Nov 2017 00:11:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E260E160C0A; Fri, 17 Nov 2017 23:11:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 30D83160BFB for ; Sat, 18 Nov 2017 00:11:04 +0100 (CET) Received: (qmail 72276 invoked by uid 500); 17 Nov 2017 23:11:03 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 72256 invoked by uid 99); 17 Nov 2017 23:11:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Nov 2017 23:11:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 802D7180117 for ; Fri, 17 Nov 2017 23:11:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DbXdX9_bYmZP for ; Fri, 17 Nov 2017 23:11:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E38DD5FD14 for ; Fri, 17 Nov 2017 23:11:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 685E4E045B for ; Fri, 17 Nov 2017 23:11:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 20BBB23F05 for ; Fri, 17 Nov 2017 23:11:00 +0000 (UTC) Date: Fri, 17 Nov 2017 23:11:00 +0000 (UTC) From: "huaxiang sun (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 17 Nov 2017 23:11:05 -0000 [ https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun updated HBASE-19163: --------------------------------- Attachment: (was: HBASE-19163.master.005.patch) > "Maximum lock count exceeded" from region server's batch processing > ------------------------------------------------------------------- > > Key: HBASE-19163 > URL: https://issues.apache.org/jira/browse/HBASE-19163 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3 > Reporter: huaxiang sun > Assignee: huaxiang sun > Attachments: HBASE-19163-master-v001.patch, HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, HBASE-19163.master.004.patch, unittest-case.diff > > > In one of use cases, we found the following exception and replication is stuck. > {code} > 2017-10-25 19:41:17,199 WARN [hconnection-0x28db294f-shared--pool4-t936] client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last exception: java.io.IOException: java.io.IOException: Maximum lock count exceeded > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) > Caused by: java.lang.Error: Maximum lock count exceeded > at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528) > at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327) > at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871) > at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148) > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more > {code} > While we are still examining the data pattern, it is sure that there are too many mutations in the batch against the same row, this exceeds the maximum 64k shared lock count and it throws an error and failed the whole batch. > There are two approaches to solve this issue. > 1). Let's say there are mutations against the same row in the batch, we just need to acquire the lock once for the same row vs to acquire the lock for each mutation. > 2). We catch the error and start to process whatever it gets and loop back. > With HBASE-17924, approach 1 seems easy to implement now. > Create the jira and will post update/patch when investigation moving forward. -- This message was sent by Atlassian JIRA (v6.4.14#64029)