phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-4131) UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can deadlock
Date Thu, 31 Aug 2017 05:58:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Samarth Jain updated PHOENIX-4131:
----------------------------------
    Attachment: PHOENIX-4131.patch

We need to make sure that we are closing all scanners otherwise it could prevent mini cluster
from shutting down. Also, I have slightly relaxed the check we have for checking if number
of scans that are involved in writing. With this patch, I saw that my local runs were no longer
hanging. I also removed the jvm halt shutdown hook since it was sometimes causing otherwise
successful builds to fail. Would be interesting to see test run results.

> UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can deadlock
> --------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4131
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>         Attachments: PHOENIX-4131.patch
>
>
> On my local test run I saw that the tests were not completing because the mini cluster
couldn't shut down. So I took a jstack and discovered the following deadlock:
> {code}
> "RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000 nid=0x37b3f
runnable [0x00007000115f5000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.Object.wait(Native Method)
> 	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201)
> 	- locked <0x000000072bc406b8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> 	at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> "RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon prio=5 os_prio=31
tid=0x00007fafae856000 nid=0x1abdb waiting for monitor entry [0x00007000102bc000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734)
> 	- waiting to lock <0x000000072bc406b8> (a java.lang.Object)
> 	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236)
> 	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629)
> 	- locked <0x000000072b625a90> (a org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}
> preClose() has the object monitor and is waiting for scanReferencesCount to go down to
0. doPostScannerOpen() is trying to acquire the same lock so that it can reduce the scanReferencesCount
to 0.
> I think this bug was introduced in PHOENIX-3111 to solve other deadlocks. FYI, [~rajeshbabu],
[~sergey.soldatov], [~enis], [~lhofhansl].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message