hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14689) Addendum and unit test for HBASE-13471
Date Wed, 18 Nov 2015 04:34:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010201#comment-15010201
] 

Enis Soztutar commented on HBASE-14689:
---------------------------------------

>From the hbase-dev, this seems to cause an issue with row lock timeouts, found in 1.0.3
and 1.1.3RC testing. I was not able to reproduce the row lock timeouts and client hangs using
0.98.16 RC.  

Running single node setup with an SSD disk, and running: 
{code}
bin/hbase pe  --latency --nomapred --presplit=10  randomWrite 10
{code}
reproduces the problem for me easily. 

This is the stack trace reported from handlers, which then gets blocked indefinitely: 
{code}
2015-11-17 19:38:04,267 WARN  [B.defaultRpcServer.handler=4,queue=1,port=61707] regionserver.HRegion:
Failed getting lock in batch put, row=00000000000000000000085521
java.io.IOException: Timed out waiting for lock for row: 00000000000000000000085521
	at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3995)
	at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2661)
	at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2519)
	at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2473)
	at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2477)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:654)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:618)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1864)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2049)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:111)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
	at java.lang.Thread.run(Thread.java:745)
{code}

I've reverted the patch in all branches to be on the safe side until we understand the issue
better. Sorry for the trouble. 

> Addendum and unit test for HBASE-13471
> --------------------------------------
>
>                 Key: HBASE-14689
>                 URL: https://issues.apache.org/jira/browse/HBASE-14689
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
>         Attachments: hbase-14689_v1-branch-1.1.patch, hbase-14689_v1-branch-1.1.patch,
hbase-14689_v1.patch
>
>
> One of our customers ran into HBASE-13471, which resulted in all the handlers getting
blocked and various other issues. While backporting the issue, I noticed that there is one
more case where we might go into infinite loop. In case a row lock cannot be acquired (due
to a previous leak for example which we have seen in Phoenix before) this will cause similar
infinite loop. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message