Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 11 Sep 2014 01:50:33 +0000 (UTC)
From: "Todd Lipcon (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12740660.1410399370000.5909.1410400233769@Atlassian.JIRA>
In-Reply-To: <JIRA.12740660.1410399370000@Atlassian.JIRA>
References: <JIRA.12740660.1410399370000@Atlassian.JIRA>
 <JIRA.12740660.1410399370070@arcas>
Subject: [jira] [Commented] (HBASE-11945) Client writes may be reordered
 under contention
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129502#comment-14129502 ] 

Todd Lipcon commented on HBASE-11945:
-------------------------------------

The potential interleaving is:

Client 1: issues a batch with 2000 puts: Put "row1", "cf:col1", {0...1000}, Put "row2", "cf:col1", {0...1000}
Client 2: issues a batch with 1 put: Put "row2", "cf:col2", "x"
(ie same row, different column)

These two clients will contend for the same row lock. The "minibatch" code path iterates through the batch trying to acquire locks, and skipping the operations for a later pass if the lock is not available. So, I think these may interleave as follows:

C1: acquires lock for row1, and is in the process of iterating over the rest of the "row1" operations
C2: acquires lock for "row2", and is in the process of actually applying the operation to MemStore, etc
C1: fails to acquire the lock for the first row2 op, since row1 already has it. But, there are still 999 more row2 ops to iterate over
C2: commits its "row2" operation, releasing the lock
C1: manages to acquire the lock for a later row2 op (eg the put of "row2", "cf:col1", 500
C1: commits the minibatch

Now it is easy to see that C1 has committed its put of "500" before other puts which came earlier from the client.

This re-ordering is unexpected from C1's point of view, since when it later reads the row, something other than the "latest" data might persist (eg the 1000th put it did might actually have gotten executed first instead of last). The problem's worse with a delete/insert sequence, when you have a 50% chance of ending up with a deleted row at the end.

I haven't tried to produce this bug, but I think you could build a functional test as follows:

T1: writes batches with 1000 puts (arbitrary contnets) to "row1" and 1000 puts to "row2" (increasing integers)
T2: writes non-batched writes to a different column of row2
T3: read "row2" in a loop and verify that the integer column is never seen to decrease.

1000 might not be large enough batches to reliably reproduce it, but I bet you could get this to fail eventually.

> Client writes may be reordered under contention
> -----------------------------------------------
>
>                 Key: HBASE-11945
>                 URL: https://issues.apache.org/jira/browse/HBASE-11945
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.6
>            Reporter: Todd Lipcon
>
> I haven't seen this bug in practice, but I was thinking about this a bit and think there may be a correctness issue with the way that we handle client batches which contain multiple operations which touch the same row. The client expects that these operations will be performed in the same order they were submitted, but under contention I believe they can get arbitrarily reordered, leading to incorrect results.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)