Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F90B1860A for ; Sun, 2 Aug 2015 11:34:05 +0000 (UTC) Received: (qmail 8021 invoked by uid 500); 2 Aug 2015 11:34:05 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 7963 invoked by uid 500); 2 Aug 2015 11:34:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 7950 invoked by uid 99); 2 Aug 2015 11:34:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Aug 2015 11:34:05 +0000 Date: Sun, 2 Aug 2015 11:34:05 +0000 (UTC) From: "Heng Chen (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650710#comment-14650710 ] Heng Chen commented on HBASE-14178: ----------------------------------- After check the code, i think the RS was blocked due to If some clients get some large row simultaneously, if the table's block cache is closed, all get request will try to acquire the offsetLock and only one lock successfully, the others will wait before lock is released. All the handlers will be exhausted, and RS will be blocked! > regionserver blocks because of waiting for offsetLock > ----------------------------------------------------- > > Key: HBASE-14178 > URL: https://issues.apache.org/jira/browse/HBASE-14178 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.6 > Reporter: Heng Chen > Priority: Critical > Fix For: 0.98.6 > > Attachments: 1.patch, jstack > > > My regionserver blocks, and all client rpc timeout. > I print the regionserver's jstack, it seems a lot of thread was blocked for waiting offsetLock, detail infomation belows: > PS: my table's block cache is off > {code} > "B.DefaultRpcServer.handler=2,queue=2,port=60020" #82 daemon prio=5 os_prio=0 tid=0x0000000001827000 nid=0x2cdc in Object.wait() [0x00007f3831b72000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) > - locked <0x0000000773af7c18> (a org.apache.hadoop.hbase.util.IdLock$Entry) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) > at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) > at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) > at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) > at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) > at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) > at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) > at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) > at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) > at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) > at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) > - locked <0x00000005e5c55ad0> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) > at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - <0x00000005e5c55c08> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)