Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EDA48200B11 for ; Mon, 30 May 2016 06:10:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EC3EA160A07; Mon, 30 May 2016 04:10:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1C636160A2C for ; Mon, 30 May 2016 06:10:13 +0200 (CEST) Received: (qmail 14508 invoked by uid 500); 30 May 2016 04:10:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 14449 invoked by uid 99); 30 May 2016 04:10:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 May 2016 04:10:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 038922C1F60 for ; Mon, 30 May 2016 04:10:13 +0000 (UTC) Date: Mon, 30 May 2016 04:10:13 +0000 (UTC) From: "Heng Chen (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-15900) RS stuck in get lock of HStore MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 30 May 2016 04:10:15 -0000 [ https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306187#comment-15306187 ] Heng Chen edited comment on HBASE-15900 at 5/30/16 4:09 AM: ------------------------------------------------------------ [~stack] I have found something HStore.lock.readLock was hold by this thread below. So compaction could not acquire the lock.writeLock in HStore.replaceStoreFiles, so all compaction was blocked and memstore could not be flushed because of so many store files exist. Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry, it happened many times. Maybe it was HBASE-14178 And there is another point i can't understand, only readLock was held by one thread, why there were so many threads waiting for readLock? BTW. scan operation in my cluster is only called by phoenix, not sure it has relates with the problem. {code} Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020): State: WAITING Blocked count: 224987 Waited count: 253413 Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81) org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397) org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259) org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634) org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156) org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363) org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217) org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5294) org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454) org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253) org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) {code} was (Author: chenheng): [~stack] I have found something HStore.lock.readLock was hold by this thread below. So compaction could not acquire the lock.writeLock in HStore.replaceStoreFiles, so all compaction was blocked and memstore could not be flushed because of so many store files exist. Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry, it happened many times. Maybe it was HBASE-14178 BTW. scan operation in my cluster is only called by phoenix, not sure it has relates with the problem. {code} Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020): State: WAITING Blocked count: 224987 Waited count: 253413 Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81) org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397) org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259) org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634) org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156) org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363) org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217) org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5294) org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454) org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253) org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) {code} > RS stuck in get lock of HStore > ------------------------------ > > Key: HBASE-15900 > URL: https://issues.apache.org/jira/browse/HBASE-15900 > Project: HBase > Issue Type: Bug > Affects Versions: 1.1.1, 1.3.0 > Reporter: Heng Chen > Attachments: 9fe15a52_9fe15a52_save, c91324eb_81194e359707acadee2906ffe36ab130.log, dump.txt > > > It happens on my production cluster when i run MR job. I save the dump.txt from this RS webUI. > Many threads stuck here: > {code} > Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020): > 32 State: WAITING > 31 Blocked count: 477816 > 30 Waited count: 535255 > 29 Waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67 > 28 Stack: > 27 sun.misc.Unsafe.park(Native Method) > 26 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > 25 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > 24 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > 23 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > 22 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > 21 org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666) > 20 org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621) > 19 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038) > 18 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793) > 17 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735) > 16 org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692) > 15 org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654) > 14 org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029) > 13 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) > 12 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) > 11 org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > 10 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > 9 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > 8 java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)