hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: Fix RPC deadlock when splitting regions on same RS as meta under heavy load
Date Wed, 08 Sep 2010 16:46:40 GMT


> On 2010-09-07 18:33:16, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java, line 207
> > <http://review.cloudera.org/r/798/diff/2/?file=11132#file11132line207>
> >
> >     maybe now we can do an:
> >     
> >     assert !this.parent.lock.writeLock().isHeldByCurrentThread() : "Unsafe to hold
write lock while performing RPCs";

I'll add in this assert


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/798/#review1122
-----------------------------------------------------------


On 2010-09-07 13:38:39, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/798/
> -----------------------------------------------------------
> 
> (Updated 2010-09-07 13:38:39)
> 
> 
> Review request for hbase and stack.
> 
> 
> Summary
> -------
> 
> Moves all RPCs outside of the region writeLock - the writeLock is now only used long
enough to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon
acquiring the lock, and thus throw NSRE.
> 
> In the case that we abort the split, it will reopen the region as before. Accessors will
have gotten NSRE but will just come back to the same region eventually.
> 
> 
> This addresses bug HBASE-2964.
>     http://issues.apache.org/jira/browse/HBASE-2964
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java a692125 
>   src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java a245d97

> 
> Diff: http://review.cloudera.org/r/798/diff
> 
> 
> Testing
> -------
> 
> YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran
a 5 hour load test overnight and it worked OK.
> 
> 
> Thanks,
> 
> Todd
> 
>


Mime
View raw message