Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 4 Jul 2013 14:55:49 +0000 (UTC)
From: "Dave Latham (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12654793.1372189765357.900.1372949749902@arcas>
In-Reply-To: <JIRA.12654793.1372189765357@arcas>
References: <JIRA.12654793.1372189765357@arcas>
Subject: [jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly
 in HRegion.doMiniBatchMutation for duplicate rows.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700110#comment-13700110 ] 

Dave Latham commented on HBASE-8806:
------------------------------------

{quote}The changes in pom.xml are not needed for this fix, right ?{quote}
That's right, sorry about that.  Just helps get m2eclipse to compile everything correctly.

{quote}We need change all the CP hooks signatures? CPs will(should) be able to acquire locks and release?{quote}
The CP changes required would be in RegionObserver:
 - {{preBatchMutate(...,MiniBatchOperationInProgress<Pair<Mutation, Integer>> miniBatchOp)}} becomes
 - {{preBatchMutate(...,MiniBatchOperationInProgress<Mutation> miniBatchOp)}}
 - {{postBatchMutate(...,MiniBatchOperationInProgress<Pair<Mutation, Integer>> miniBatchOp)}} becomes
 - {{postBatchMutate(...,MiniBatchOperationInProgress<Mutation> miniBatchOp)}}
I don't have much experience with coprocessor usage, but it seems unlikely to me someone would require manipulating the set of row locks in the middle of a miniBatch mutation.  Would it be prudent to poll the mailing list first?

Re: patch v5
Looks good to me.  Anoop's notion of favoring the case where the lock is new versus the lock is already owned is an interesting one.  Two possibilities of code with two cases:
Code A:  patch v5 - first check rowsAlreadyLocked then getLock
 - Case 1 (lock is new): hash comparison in rowsAlreadyLocked likely returns !contains quickly.  getLock likely also quickly finds no hash match in concurrent hashmap and inserts.
 - Case 2 (lock is already owned): hashCode match then bytewise equals comparison in rowsAlreadyLocked.contains.  getLock not called

Code B: first try getLock, then if unable to getLock check rowsAlreadyLocked
 - Case 1 (lock is new): getLock quickly finds no hash match in concurrent hashmap and inserts.
 - Case 2 (lock is already owned): getLock checks concurrent hashmap finds hashCode match then does bytewise equals comparison.  then rowsAlreadyLocked also gets hashCode match then bytewise equals comparison

So Code A has an extra hash check for Case 1 and Code B has an extra hash check and bytewise equal compare for Case 2.  I'd favor current patch (Code A) as it's a smaller and constant time addition and I think the code is a bit more intuitive, but like the property in Code B that it has no impact on Case 1 compared to the existing 0.94 releases.  I'd be content if Rahul's tests on the latest patch show imapct is small.

{quote}Is it ok if we sort it out on the client side itself? {quote}
Not for 0.94 as it would be changing the contract on existing clients which may suddenly fail to respect row locks if they aren't updated.  Though it would be interesting to ask clients to sort for performance gains and then re-sort with a stable sort on the server.  But if people are happy with the other approaches discussed, they sound good to me.

{quote}Should we commit something like the 0.94 patch to trunk as well, for now; and then regroup and add re-entrant locks in a different jira?{quote}
+1 so long as people are comfortable with performance.  Let's give Rahul a chance to run his perf test on this patch if no one's in a hurry.

I will try to update my reentrant locks patch for trunk.  After posting it up I was considering introducing that lock groups idea too and using it for checkAndPut.  May play with the code there unless someone beats me to it.

Thanks to everyone for the attention on this issue!
                
> Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-8806
>                 URL: https://issues.apache.org/jira/browse/HBASE-8806
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.5
>            Reporter: rahul gidwani
>            Priority: Critical
>             Fix For: 0.95.2, 0.94.10
>
>         Attachments: 8806-0.94-v4.txt, 8806-0.94-v5.txt, HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch, HBASE-8806-0.94.10-v3.patch, HBASE-8806.patch, HBASE-8806-threadBasedRowLocks.patch
>
>
> If we already have the lock in the doMiniBatchMutation we don't need to re-acquire it. The solution would be to keep a cache of the rowKeys already locked for a miniBatchMutation and If we already have the 
> rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to this problem would be to keep a set of rows we already locked and not try to acquire the lock for these rows.  
> We have tested this fix in our production environment and has improved replication performance quite a bit.  We saw a replication batch go from 3+ minutes to less than 10 seconds for batches with duplicate row keys.
> {code}
> static final int ACQUIRE_LOCK_COUNT = 0;
>   @Test
>   public void testRedundantRowKeys() throws Exception {
>     final int batchSize = 100000;
>     
>     String tableName = getClass().getSimpleName();
>     Configuration conf = HBaseConfiguration.create();
>     conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
>     MockHRegion region = (MockHRegion) TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, Bytes.toBytes("a"));
>     List<Pair<Mutation, Integer>> someBatch = Lists.newArrayList();
>     int i = 0;
>     while (i < batchSize) {
>       if (i % 2 == 0) {
>         someBatch.add(new Pair<Mutation, Integer>(new Put(Bytes.toBytes(0)), null));
>       } else {
>         someBatch.add(new Pair<Mutation, Integer>(new Put(Bytes.toBytes(1)), null));
>       }
>       i++;
>     }
>     long startTime = System.currentTimeMillis();
>     region.batchMutate(someBatch.toArray(new Pair[0]));
>     long endTime = System.currentTimeMillis();
>     long duration = endTime - startTime;
>     System.out.println("duration: " + duration + " ms");
>     assertEquals(2, ACQUIRE_LOCK_COUNT);
>   }
>   @Override
>   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) throws IOException {
>     ACQUIRE_LOCK_COUNT++;
>     return super.getLock(lockid, row, waitForLock);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira