hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heng Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15900) RS stuck in get lock of HStore
Date Mon, 30 May 2016 03:10:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306187#comment-15306187
] 

Heng Chen commented on HBASE-15900:
-----------------------------------

[~stack]
I have found something
HStore.lock.readLock was hold by this thread below.  So compaction could not acquire the lock.writeLock
in HStore.replaceStoreFiles,  so all compaction was blocked and memstore could not be flushed
because of so many store files exist.  

Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry,  it happened many
times.

BTW. scan operation in my cluster is only called by phoenix, not sure it has relates with
the problem.

{code}
Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020):
  State: WAITING
  Blocked count: 224987
  Waited count: 253413
  Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:502)
    org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397)
    org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
    org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
    org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
    org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
    org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:217)
    org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003)
    org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5294)
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454)
    org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253)
    org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
{code}

> RS stuck in get lock of HStore
> ------------------------------
>
>                 Key: HBASE-15900
>                 URL: https://issues.apache.org/jira/browse/HBASE-15900
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 1.3.0
>            Reporter: Heng Chen
>         Attachments: c91324eb_81194e359707acadee2906ffe36ab130.log, dump.txt
>
>
> It happens on my production cluster when i run MR job.  I save the dump.txt from this
RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
>    32   State: WAITING
>    31   Blocked count: 477816
>    30   Waited count: 535255
>    29   Waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
>    28   Stack:
>    27     sun.misc.Unsafe.park(Native Method)
>    26     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>    25     java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>    24     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>    23     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>    22     java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>    21     org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
>    20     org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
>    19     org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
>    18     org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
>    17     org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
>    16     org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
>    15     org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
>    14     org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
>    13     org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>    12     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
>    11     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>    10     org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>     9     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>     8     java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message