hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14689) Addendum and unit test for HBASE-13471
Date Wed, 18 Nov 2015 07:47:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010417#comment-15010417

Enis Soztutar commented on HBASE-14689:

The issue is that two handlers can deadlock in doMiniBatchMutation() trying to acquire locks
since we do not seem to be acquiring the locks in sorted order (unlike mutateRowsWithLocks()).
Without this patch, the doMiniBatchMutation() does tryLock() kind of semantics where the batch
is performed with as much acquired locks as possible. 

Logging to demonstrate that dead lock is below. Notice that handler=24 and 20 are waiting
for each other. 
2015-11-17 23:37:17,209 INFO  [B.defaultRpcServer.handler=24,queue=0,port=52309] regionserver.HRegion:
existingContext=RowLockContext, row=00000000000000000000178728, latch=1 ,thread=Thread[B.defaultRpcServer.handler=20,queue=2,port=52309,5,main],
2015-11-17 23:37:17,209 INFO  [B.defaultRpcServer.handler=20,queue=2,port=52309] regionserver.HRegion:
existingContext=RowLockContext, row=00000000000000000000179326, latch=1 ,thread=Thread[B.defaultRpcServer.handler=24,queue=0,port=52309,5,main],

> Addendum and unit test for HBASE-13471
> --------------------------------------
>                 Key: HBASE-14689
>                 URL: https://issues.apache.org/jira/browse/HBASE-14689
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>         Attachments: hbase-14689_v1-branch-1.1.patch, hbase-14689_v1-branch-1.1.patch,
> One of our customers ran into HBASE-13471, which resulted in all the handlers getting
blocked and various other issues. While backporting the issue, I noticed that there is one
more case where we might go into infinite loop. In case a row lock cannot be acquired (due
to a previous leak for example which we have seen in Phoenix before) this will cause similar
infinite loop. 

This message was sent by Atlassian JIRA

View raw message