Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8DC77200498 for ; Tue, 29 Aug 2017 09:25:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8C2FB1662DD; Tue, 29 Aug 2017 07:25:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AEF7A1662DC for ; Tue, 29 Aug 2017 09:25:05 +0200 (CEST) Received: (qmail 55290 invoked by uid 500); 29 Aug 2017 07:25:04 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 55279 invoked by uid 99); 29 Aug 2017 07:25:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2017 07:25:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 26E98CC140 for ; Tue, 29 Aug 2017 07:25:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 5ZB3BGi5piRL for ; Tue, 29 Aug 2017 07:25:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E2AF35F3FE for ; Tue, 29 Aug 2017 07:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 337AEE0059 for ; Tue, 29 Aug 2017 07:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1F00823F0D for ; Tue, 29 Aug 2017 07:25:00 +0000 (UTC) Date: Tue, 29 Aug 2017 07:25:00 +0000 (UTC) From: "Rajeshbabu Chintaguntla (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-4131) UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 29 Aug 2017 07:25:06 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144862#comment-16144862 ] Rajeshbabu Chintaguntla commented on PHOENIX-4131: -------------------------------------------------- [~samarthjain] are you working on it or you want me to take a look? > UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can deadlock > -------------------------------------------------------------------------------- > > Key: PHOENIX-4131 > URL: https://issues.apache.org/jira/browse/PHOENIX-4131 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain > Assignee: Samarth Jain > > On my local test run I saw that the tests were not completing because the mini cluster couldn't shut down. So I took a jstack and discovered the following deadlock: > {code} > "RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000 nid=0x37b3f runnable [0x00007000115f5000] > java.lang.Thread.State: RUNNABLE > at java.lang.Object.wait(Native Method) > at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201) > - locked <0x000000072bc406b8> (a java.lang.Object) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490) > at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843) > at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805) > at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052) > at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157) > at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) > at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) > at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334) > at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139) > at java.lang.Thread.run(Thread.java:748) > {code} > {code} > "RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon prio=5 os_prio=31 tid=0x00007fafae856000 nid=0x1abdb waiting for monitor entry [0x00007000102bc000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734) > - waiting to lock <0x000000072bc406b8> (a java.lang.Object) > at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236) > at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629) > - locked <0x000000072b625a90> (a org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833) > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > {code} > preClose() has the object monitor and is waiting for scanReferencesCount to go down to 0. doPostScannerOpen() is trying to acquire the same lock so that it can reduce the scanReferencesCount to 0. > I think this bug was introduced in PHOENIX-3111 to solve other deadlocks. FYI, [~rajeshbabu], [~sergey.soldatov], [~enis], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)