Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 85BC7200CE1 for ; Fri, 28 Jul 2017 06:53:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 84AD216C2BA; Fri, 28 Jul 2017 04:53:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CBAA516C2B9 for ; Fri, 28 Jul 2017 06:53:09 +0200 (CEST) Received: (qmail 81961 invoked by uid 500); 28 Jul 2017 04:53:08 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 81950 invoked by uid 99); 28 Jul 2017 04:53:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jul 2017 04:53:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2694DC02AA for ; Fri, 28 Jul 2017 04:53:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id lZ1HombIYiha for ; Fri, 28 Jul 2017 04:53:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 799715F245 for ; Fri, 28 Jul 2017 04:53:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B5211E00A3 for ; Fri, 28 Jul 2017 04:53:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D7CF024D19 for ; Fri, 28 Jul 2017 04:53:02 +0000 (UTC) Date: Fri, 28 Jul 2017 04:53:00 +0000 (UTC) From: "Samarth Jain (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-4051) Prevent out-of-order updates for mutable index updates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 28 Jul 2017 04:53:10 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104454#comment-16104454 ] Samarth Jain commented on PHOENIX-4051: --------------------------------------- bq. For an UPSERT VALUES, the timestamp is gotten from the server at commit time. This is done so that the timestamp is consistent across all rows being written. I don't think this actually happens in practice. In doMiniBatchMutation HBase tries to acquire as many locks as it can. For the mutations in the batch for which it is able to acquire row locks, it sets the same timestamp. For the ones it is not able to, it comes back and tries to acquire locks again in which case the timestamp ends up being different from the first attempt. This happens more often when there are concurrent updates to the same rows. > Prevent out-of-order updates for mutable index updates > ------------------------------------------------------ > > Key: PHOENIX-4051 > URL: https://issues.apache.org/jira/browse/PHOENIX-4051 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Assignee: James Taylor > Attachments: PHOENIX-4051_v1.patch > > > Out-of-order processing of data rows during index maintenance causes mutable indexes to become out of sync with regard to the data table. Here's a simple example to illustrate the issue: > # Assume table T(K,V) and index X(V,K). > # Upsert T(A, 1) at t10. Index updates: Put X(1,A) at t10. > # Upsert T(A, 3) at t30. Index updates: Delete X(1,A) at t29, Put X(3,A) at t30. > # Upsert T(A,2) at t20. Index updates: Delete X(1,A) at t19, Put X(2,A) at t20, Delete X(2,A) at t29 > Ideally, we'd want to remove the Delete X(1,A) at t29 since this isn't correct in terms of timeline consistency, but we can't do that with HBase without support for deleting/undoing Delete markers. > The above is not what is occurring. Instead, when T(A,2) comes in, the Put X(2,A) will occur at t20, but the Delete won't occur. This causes more index rows than data rows, essentially making it invalid. > A quick fix is to reset the timestamp of the data table mutations to the current time within the preBatchMutate call, when the row is exclusively locked. This skirts the issue because then timestamps won't overlap. -- This message was sent by Atlassian JIRA (v6.4.14#64029)