Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37B2611E5D for ; Thu, 11 Sep 2014 01:50:34 +0000 (UTC) Received: (qmail 46862 invoked by uid 500); 11 Sep 2014 01:50:33 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 46816 invoked by uid 500); 11 Sep 2014 01:50:33 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 46803 invoked by uid 99); 11 Sep 2014 01:50:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2014 01:50:33 +0000 Date: Thu, 11 Sep 2014 01:50:33 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11945) Client writes may be reordered under contention MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129502#comment-14129502 ] Todd Lipcon commented on HBASE-11945: ------------------------------------- The potential interleaving is: Client 1: issues a batch with 2000 puts: Put "row1", "cf:col1", {0...1000}, Put "row2", "cf:col1", {0...1000} Client 2: issues a batch with 1 put: Put "row2", "cf:col2", "x" (ie same row, different column) These two clients will contend for the same row lock. The "minibatch" code path iterates through the batch trying to acquire locks, and skipping the operations for a later pass if the lock is not available. So, I think these may interleave as follows: C1: acquires lock for row1, and is in the process of iterating over the rest of the "row1" operations C2: acquires lock for "row2", and is in the process of actually applying the operation to MemStore, etc C1: fails to acquire the lock for the first row2 op, since row1 already has it. But, there are still 999 more row2 ops to iterate over C2: commits its "row2" operation, releasing the lock C1: manages to acquire the lock for a later row2 op (eg the put of "row2", "cf:col1", 500 C1: commits the minibatch Now it is easy to see that C1 has committed its put of "500" before other puts which came earlier from the client. This re-ordering is unexpected from C1's point of view, since when it later reads the row, something other than the "latest" data might persist (eg the 1000th put it did might actually have gotten executed first instead of last). The problem's worse with a delete/insert sequence, when you have a 50% chance of ending up with a deleted row at the end. I haven't tried to produce this bug, but I think you could build a functional test as follows: T1: writes batches with 1000 puts (arbitrary contnets) to "row1" and 1000 puts to "row2" (increasing integers) T2: writes non-batched writes to a different column of row2 T3: read "row2" in a loop and verify that the integer column is never seen to decrease. 1000 might not be large enough batches to reliably reproduce it, but I bet you could get this to fail eventually. > Client writes may be reordered under contention > ----------------------------------------------- > > Key: HBASE-11945 > URL: https://issues.apache.org/jira/browse/HBASE-11945 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.6 > Reporter: Todd Lipcon > > I haven't seen this bug in practice, but I was thinking about this a bit and think there may be a correctness issue with the way that we handle client batches which contain multiple operations which touch the same row. The client expects that these operations will be performed in the same order they were submitted, but under contention I believe they can get arbitrarily reordered, leading to incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)