Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7D4010D4D for ; Tue, 24 Feb 2015 05:21:12 +0000 (UTC) Received: (qmail 83960 invoked by uid 500); 24 Feb 2015 05:21:12 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 83920 invoked by uid 500); 24 Feb 2015 05:21:12 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 83909 invoked by uid 99); 24 Feb 2015 05:21:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2015 05:21:12 +0000 Date: Tue, 24 Feb 2015 05:21:12 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-13083) Master can be dead-locked while assigning META. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334416#comment-14334416 ] stack edited comment on HBASE-13083 at 2/24/15 5:21 AM: -------------------------------------------------------- This is like HBASE-12958. It took care of one deadlock. This seems like a more likely deadlock than the one seen over there. Thanks for the patch [~octo47] was (Author: stack): This is like HBASE-12985. It took care of one deadlock. This seems like a more likely deadlock than the one seen over there. Thanks for the patch [~octo47] > Master can be dead-locked while assigning META. > ----------------------------------------------- > > Key: HBASE-13083 > URL: https://issues.apache.org/jira/browse/HBASE-13083 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment > Affects Versions: 2.0.0, 1.1.0 > Reporter: Andrey Stepachev > Assignee: Andrey Stepachev > Fix For: 2.0.0, 1.0.1, 1.1.0 > > Attachments: HBASE-13083-branch-1.patch, HBASE-13083.patch > > > We got situation when master is deadlocked. > It seems we have deadlock in master code. In SSH it calls RegionStates#serverOffline which in turn > aquires synchronized(this) effectively block all requests to RegionStates. > In another thread it processes assignMeta, which tries to access region states and blocks. > Finally any assignment operations try to access meta for table states and region operations, but > cannot do that due of locked RegionStates class. > serverOffline() waiting for meta availability > {code} > Thread 17019: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) > - java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode, boolean, long) @bci=158, line=458 (Compiled frame) > /serverOffline > - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) > - org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher, int, long) @bci=74, line=605 (Interpreted frame) > - org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher, long) @bci=4, line=580 (Interpreted frame) > - org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher, long, org.apache.hadoop.conf.Configuration) @bci=65, line=559 (Interpreted frame) > - org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation() @bci=69, line=58 (Interpreted frame) > - org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(org.apache.hadoop.hbase.TableName, boolean, int) @bci=83, line=1131 (Compiled frame) > - org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(org.apache.hadoop.hbase.TableName, byte[], boolean, boolean, int) @bci=74, line=1098 (Compiled frame) > - org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.findAllLocationsOrFail(org.apache.hadoop.hbase.client.Action, boolean) @bci=73, line=940 (Compiled frame) > - org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(java.util.List, int) @bci=48, line=857 (Compiled frame) > - org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$100(org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl, java.util.List, int) @bci=3, line=575 (Compiled frame) > - org.apache.hadoop.hbase.client.AsyncProcess.submitAll(java.util.concurrent.ExecutorService, org.apache.hadoop.hbase.TableName, java.util.List, org.apache.hadoop.hbase.client.coprocessor.Batch$Callback, java.lang.Object[]) @bci=195, line=557 (Compiled frame) > - org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.processBatchCallback(java.util.List, org.apache.hadoop.hbase.TableName, java.util.concurrent.ExecutorService, java.lang.Object[], org.apache.hadoop.hbase.client.coprocessor.Batch$Callback) @bci=11, line=2136 (Compiled frame) > - org.apache.hadoop.hbase.util.MultiHConnection.processBatchCallback(java.util.List, org.apache.hadoop.hbase.TableName, java.lang.Object[], org.apache.hadoop.hbase.client.coprocessor.Batch$Callback) @bci=24, line=125 (Compiled frame) > - org.apache.hadoop.hbase.master.RegionStateStore.updateRegionState(long, org.apache.hadoop.hbase.master.RegionState, org.apache.hadoop.hbase.master.RegionState) @bci=421, line=244 (Compiled frame) > - org.apache.hadoop.hbase.master.RegionStates.updateRegionState(org.apache.hadoop.hbase.HRegionInfo, org.apache.hadoop.hbase.master.RegionState$State, org.apache.hadoop.hbase.ServerName, long) @bci=149, line=1109 (Compiled frame) > - org.apache.hadoop.hbase.master.RegionStates.updateRegionState(org.apache.hadoop.hbase.HRegionInfo, org.apache.hadoop.hbase.master.RegionState$State, org.apache.hadoop.hbase.ServerName) @bci=7, line=425 (Compiled frame) > - org.apache.hadoop.hbase.master.RegionStates.updateRegionState(org.apache.hadoop.hbase.HRegionInfo, org.apache.hadoop.hbase.master.RegionState$State) @bci=24, line=383 (Compiled frame) > - org.apache.hadoop.hbase.master.RegionStates.regionOffline(org.apache.hadoop.hbase.HRegionInfo, org.apache.hadoop.hbase.master.RegionState$State) @bci=83, line=586 (Interpreted frame) > - org.apache.hadoop.hbase.master.RegionStates.regionOffline(org.apache.hadoop.hbase.HRegionInfo) @bci=3, line=566 (Interpreted frame) > - org.apache.hadoop.hbase.master.RegionStates.serverOffline(org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher, org.apache.hadoop.hbase.ServerName) @bci=494, line=667 (Interpreted frame) > - org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(org.apache.hadoop.hbase.ServerName) @bci=101, line=3334 (Interpreted frame) > - org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process() @bci=626, line=237 (Interpreted frame) > - org.apache.hadoop.hbase.executor.EventHandler.run() @bci=33, line=128 (Interpreted frame) > - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line > {code} > Blocked meta looks like: > {code} > Thread 18357: (state = BLOCKED) > - org.apache.hadoop.hbase.master.RegionStates.getRegionState(java.lang.String) @bci=0, line=1053 (Compiled frame) > - org.apache.hadoop.hbase.master.RegionStates.getRegionState(org.apache.hadoop.hbase.HRegionInfo) @bci=5, line=1036 (Compiled frame) > - org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(org.apache.hadoop.hbase.HRegionInfo, boolean) @bci=5, line=1915 (Interpreted frame) > - org.apache.hadoop.hbase.master.AssignmentManager.assign(org.apache.hadoop.hbase.HRegionInfo, boolean, boolean) @bci=29, line=1564 (Interpreted frame) > - org.apache.hadoop.hbase.master.AssignmentManager.assign(org.apache.hadoop.hbase.HRegionInfo, boolean) @bci=4, line=1550 (Interpreted frame) > - org.apache.hadoop.hbase.master.AssignmentManager.assignMeta(org.apache.hadoop.hbase.HRegionInfo) @bci=23, line=2636 (Interpreted frame) > - org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta() @bci=64, line=159 (Interpreted frame) > - org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries() @bci=39, line=184 (Interpreted frame) > - org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process() @bci=276, line=93 (Interpreted frame) > - org.apache.hadoop.hbase.executor.EventHandler.run() @bci=33, line=128 (Interpreted frame) > - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)