hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11945) Client writes may be reordered under contention
Date Thu, 11 Sep 2014 01:50:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129502#comment-14129502

Todd Lipcon commented on HBASE-11945:

The potential interleaving is:

Client 1: issues a batch with 2000 puts: Put "row1", "cf:col1", {0...1000}, Put "row2", "cf:col1",
Client 2: issues a batch with 1 put: Put "row2", "cf:col2", "x"
(ie same row, different column)

These two clients will contend for the same row lock. The "minibatch" code path iterates through
the batch trying to acquire locks, and skipping the operations for a later pass if the lock
is not available. So, I think these may interleave as follows:

C1: acquires lock for row1, and is in the process of iterating over the rest of the "row1"
C2: acquires lock for "row2", and is in the process of actually applying the operation to
MemStore, etc
C1: fails to acquire the lock for the first row2 op, since row1 already has it. But, there
are still 999 more row2 ops to iterate over
C2: commits its "row2" operation, releasing the lock
C1: manages to acquire the lock for a later row2 op (eg the put of "row2", "cf:col1", 500
C1: commits the minibatch

Now it is easy to see that C1 has committed its put of "500" before other puts which came
earlier from the client.

This re-ordering is unexpected from C1's point of view, since when it later reads the row,
something other than the "latest" data might persist (eg the 1000th put it did might actually
have gotten executed first instead of last). The problem's worse with a delete/insert sequence,
when you have a 50% chance of ending up with a deleted row at the end.

I haven't tried to produce this bug, but I think you could build a functional test as follows:

T1: writes batches with 1000 puts (arbitrary contnets) to "row1" and 1000 puts to "row2" (increasing
T2: writes non-batched writes to a different column of row2
T3: read "row2" in a loop and verify that the integer column is never seen to decrease.

1000 might not be large enough batches to reliably reproduce it, but I bet you could get this
to fail eventually.

> Client writes may be reordered under contention
> -----------------------------------------------
>                 Key: HBASE-11945
>                 URL: https://issues.apache.org/jira/browse/HBASE-11945
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.6
>            Reporter: Todd Lipcon
> I haven't seen this bug in practice, but I was thinking about this a bit and think there
may be a correctness issue with the way that we handle client batches which contain multiple
operations which touch the same row. The client expects that these operations will be performed
in the same order they were submitted, but under contention I believe they can get arbitrarily
reordered, leading to incorrect results.

This message was sent by Atlassian JIRA

View raw message