hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heng Chen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15900) RS stuck in get lock of HStore
Date Mon, 30 May 2016 04:10:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306187#comment-15306187
] 

Heng Chen edited comment on HBASE-15900 at 5/30/16 4:09 AM:
------------------------------------------------------------

[~stack]
I have found something
HStore.lock.readLock was hold by this thread below.  So compaction could not acquire the lock.writeLock
in HStore.replaceStoreFiles,  so all compaction was blocked and memstore could not be flushed
because of so many store files exist.  

Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry,  it happened many
times. Maybe it was HBASE-14178

And there is another point i can't understand,  only readLock was held by one thread, why
there were so many threads waiting for readLock?

BTW. scan operation in my cluster is only called by phoenix, not sure it has relates with
the problem.

{code}
Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020):
  State: WAITING
  Blocked count: 224987
  Waited count: 253413
  Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:502)
    org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397)
    org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
    org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
    org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
    org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
    org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:217)
    org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003)
    org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5294)
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454)
    org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253)
    org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
{code}


was (Author: chenheng):
[~stack]
I have found something
HStore.lock.readLock was hold by this thread below.  So compaction could not acquire the lock.writeLock
in HStore.replaceStoreFiles,  so all compaction was blocked and memstore could not be flushed
because of so many store files exist.  

Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry,  it happened many
times. Maybe it was HBASE-14178

BTW. scan operation in my cluster is only called by phoenix, not sure it has relates with
the problem.

{code}
Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020):
  State: WAITING
  Blocked count: 224987
  Waited count: 253413
  Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:502)
    org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397)
    org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
    org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
    org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
    org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
    org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
    org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:217)
    org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003)
    org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5294)
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454)
    org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253)
    org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
{code}

> RS stuck in get lock of HStore
> ------------------------------
>
>                 Key: HBASE-15900
>                 URL: https://issues.apache.org/jira/browse/HBASE-15900
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 1.3.0
>            Reporter: Heng Chen
>         Attachments: 9fe15a52_9fe15a52_save, c91324eb_81194e359707acadee2906ffe36ab130.log,
dump.txt
>
>
> It happens on my production cluster when i run MR job.  I save the dump.txt from this
RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
>    32   State: WAITING
>    31   Blocked count: 477816
>    30   Waited count: 535255
>    29   Waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
>    28   Stack:
>    27     sun.misc.Unsafe.park(Native Method)
>    26     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>    25     java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>    24     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>    23     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>    22     java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>    21     org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
>    20     org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
>    19     org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
>    18     org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
>    17     org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
>    16     org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
>    15     org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
>    14     org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
>    13     org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>    12     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
>    11     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>    10     org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>     9     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>     8     java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message