hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huaxiang sun (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing
Date Tue, 05 Dec 2017 21:36:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279204#comment-16279204

huaxiang sun edited comment on HBASE-19163 at 12/5/17 9:35 PM:

There are so many testing failure due to the default minBatchSize set to 10k, increasing to
20k to work around all unittest failures, until HBASE-19308 is fixed. Have to say this number
is arbitrary.

was (Author: huaxiang):
There are so many testing failure due to the default minBatchSize to 10k, increasing to 20k
to work around all unittest failures, until HBASE-19308 is fixed. Have to say this number
is arbitrary.

> "Maximum lock count exceeded" from region server's batch processing
> -------------------------------------------------------------------
>                 Key: HBASE-19163
>                 URL: https://issues.apache.org/jira/browse/HBASE-19163
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: HBASE-19163-master-v001.patch, HBASE-19163.master.001.patch, HBASE-19163.master.002.patch,
HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, HBASE-19163.master.006.patch,
HBASE-19163.master.007.patch, HBASE-19163.master.008.patch, HBASE-19163.master.009.patch,
HBASE-19163.master.009.patch, HBASE-19163.master.010.patch, unittest-case.diff
> In one of use cases, we found the following exception and replication is stuck.
> {code}
> 2017-10-25 19:41:17,199 WARN  [hconnection-0x28db294f-shared--pool4-t936] client.AsyncProcess:
#3, table=foo, attempt=5/5 failed=262836ops, last exception: java.io.IOException: java.io.IOException:
Maximum lock count exceeded
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.Error: Maximum lock count exceeded
>         at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
>         at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
>         at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
>         at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
>         ... 3 more
> {code}
> While we are still examining the data pattern, it is sure that there are too many mutations
in the batch against the same row, this exceeds the maximum 64k shared lock count and it throws
an error and failed the whole batch.
> There are two approaches to solve this issue.
> 1). Let's say there are mutations against the same row in the batch, we just need to
acquire the lock once for the same row vs to acquire the lock for each mutation.
> 2). We catch the error and start to process whatever it gets and loop back.
> With HBASE-17924, approach 1 seems easy to implement now. 
> Create the jira and will post update/patch when investigation moving forward.

This message was sent by Atlassian JIRA

View raw message