hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Latham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.
Date Thu, 04 Jul 2013 14:55:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700110#comment-13700110
] 

Dave Latham commented on HBASE-8806:
------------------------------------

{quote}The changes in pom.xml are not needed for this fix, right ?{quote}
That's right, sorry about that.  Just helps get m2eclipse to compile everything correctly.

{quote}We need change all the CP hooks signatures? CPs will(should) be able to acquire locks
and release?{quote}
The CP changes required would be in RegionObserver:
 - {{preBatchMutate(...,MiniBatchOperationInProgress<Pair<Mutation, Integer>>
miniBatchOp)}} becomes
 - {{preBatchMutate(...,MiniBatchOperationInProgress<Mutation> miniBatchOp)}}
 - {{postBatchMutate(...,MiniBatchOperationInProgress<Pair<Mutation, Integer>>
miniBatchOp)}} becomes
 - {{postBatchMutate(...,MiniBatchOperationInProgress<Mutation> miniBatchOp)}}
I don't have much experience with coprocessor usage, but it seems unlikely to me someone would
require manipulating the set of row locks in the middle of a miniBatch mutation.  Would it
be prudent to poll the mailing list first?

Re: patch v5
Looks good to me.  Anoop's notion of favoring the case where the lock is new versus the lock
is already owned is an interesting one.  Two possibilities of code with two cases:
Code A:  patch v5 - first check rowsAlreadyLocked then getLock
 - Case 1 (lock is new): hash comparison in rowsAlreadyLocked likely returns !contains quickly.
 getLock likely also quickly finds no hash match in concurrent hashmap and inserts.
 - Case 2 (lock is already owned): hashCode match then bytewise equals comparison in rowsAlreadyLocked.contains.
 getLock not called

Code B: first try getLock, then if unable to getLock check rowsAlreadyLocked
 - Case 1 (lock is new): getLock quickly finds no hash match in concurrent hashmap and inserts.
 - Case 2 (lock is already owned): getLock checks concurrent hashmap finds hashCode match
then does bytewise equals comparison.  then rowsAlreadyLocked also gets hashCode match then
bytewise equals comparison

So Code A has an extra hash check for Case 1 and Code B has an extra hash check and bytewise
equal compare for Case 2.  I'd favor current patch (Code A) as it's a smaller and constant
time addition and I think the code is a bit more intuitive, but like the property in Code
B that it has no impact on Case 1 compared to the existing 0.94 releases.  I'd be content
if Rahul's tests on the latest patch show imapct is small.

{quote}Is it ok if we sort it out on the client side itself? {quote}
Not for 0.94 as it would be changing the contract on existing clients which may suddenly fail
to respect row locks if they aren't updated.  Though it would be interesting to ask clients
to sort for performance gains and then re-sort with a stable sort on the server.  But if people
are happy with the other approaches discussed, they sound good to me.

{quote}Should we commit something like the 0.94 patch to trunk as well, for now; and then
regroup and add re-entrant locks in a different jira?{quote}
+1 so long as people are comfortable with performance.  Let's give Rahul a chance to run his
perf test on this patch if no one's in a hurry.

I will try to update my reentrant locks patch for trunk.  After posting it up I was considering
introducing that lock groups idea too and using it for checkAndPut.  May play with the code
there unless someone beats me to it.

Thanks to everyone for the attention on this issue!
                
> Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-8806
>                 URL: https://issues.apache.org/jira/browse/HBASE-8806
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.5
>            Reporter: rahul gidwani
>            Priority: Critical
>             Fix For: 0.95.2, 0.94.10
>
>         Attachments: 8806-0.94-v4.txt, 8806-0.94-v5.txt, HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch,
HBASE-8806-0.94.10-v3.patch, HBASE-8806.patch, HBASE-8806-threadBasedRowLocks.patch
>
>
> If we already have the lock in the doMiniBatchMutation we don't need to re-acquire it.
The solution would be to keep a cache of the rowKeys already locked for a miniBatchMutation
and If we already have the 
> rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to this problem
would be to keep a set of rows we already locked and not try to acquire the lock for these
rows.  
> We have tested this fix in our production environment and has improved replication performance
quite a bit.  We saw a replication batch go from 3+ minutes to less than 10 seconds for batches
with duplicate row keys.
> {code}
> static final int ACQUIRE_LOCK_COUNT = 0;
>   @Test
>   public void testRedundantRowKeys() throws Exception {
>     final int batchSize = 100000;
>     
>     String tableName = getClass().getSimpleName();
>     Configuration conf = HBaseConfiguration.create();
>     conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
>     MockHRegion region = (MockHRegion) TestHRegion.initHRegion(Bytes.toBytes(tableName),
tableName, conf, Bytes.toBytes("a"));
>     List<Pair<Mutation, Integer>> someBatch = Lists.newArrayList();
>     int i = 0;
>     while (i < batchSize) {
>       if (i % 2 == 0) {
>         someBatch.add(new Pair<Mutation, Integer>(new Put(Bytes.toBytes(0)), null));
>       } else {
>         someBatch.add(new Pair<Mutation, Integer>(new Put(Bytes.toBytes(1)), null));
>       }
>       i++;
>     }
>     long startTime = System.currentTimeMillis();
>     region.batchMutate(someBatch.toArray(new Pair[0]));
>     long endTime = System.currentTimeMillis();
>     long duration = endTime - startTime;
>     System.out.println("duration: " + duration + " ms");
>     assertEquals(2, ACQUIRE_LOCK_COUNT);
>   }
>   @Override
>   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) throws IOException
{
>     ACQUIRE_LOCK_COUNT++;
>     return super.getLock(lockid, row, waitForLock);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message