hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction
Date Tue, 07 Sep 2010 20:48:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906963#action_12906963

HBase Review Board commented on HBASE-2964:

Message from: stack@duboce.net

This is an automatically generated e-mail. To reply, visit:

(Updated 2010-09-07 13:38:39.968517)

Review request for hbase and stack.


This version removes from SplitTransaction the setting of the this.parent.lock completely.
 Its not needed.  Down in the parent close, it takes out the write lock.

In the past, we had a split lock and a close lock (splitLock and splitsAndClosesLock).  The
split lock was held across the split while daughter regions were calculated and during close,
actual split and update of .META.  As part of lock pruning, an error made in hbase-2641, was
using splitsAndClosesLock where splitLock was used previously -- and even expanding the scope
of what splitLock used cover).

Looking, splitLock looks like it could have served some purpose preventing two threads contending
over splitting (splits make objects in filesystem and move stuff around), but we don't really
need this in current HBase since only CompactSplitThread runs splits -- even in new master
regime where client can call a splitRegion. Later when we want to run multiple concurrent
split transactions, we'll need to reexamine.


Moves all RPCs outside of the region writeLock - the writeLock is now only used long enough
to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon acquiring
the lock, and thus throw NSRE.

In the case that we abort the split, it will reopen the region as before. Accessors will have
gotten NSRE but will just come back to the same region eventually.

This addresses bug HBASE-2964.

Diffs (updated)

  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java a692125 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java a245d97 

Diff: http://review.cloudera.org/r/798/diff


YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran a 5
hour load test overnight and it worked OK.



> Deadlock when RS tries to RPC to itself inside SplitTransaction
> ---------------------------------------------------------------
>                 Key: HBASE-2964
>                 URL: https://issues.apache.org/jira/browse/HBASE-2964
>             Project: HBase
>          Issue Type: Bug
>          Components: ipc, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: hbase-2964.txt
> In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation:
> - All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread.
> - CompactSplitThread is in the process of trying to edit META to create the offline parent.
META happens to be on the same server as is executing the split.
> Therefore, the CompactSplitThread is trying to connect back to itself, but all of the
handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message