Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 439A5200D15 for ; Thu, 5 Oct 2017 20:36:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3F95F1609E2; Thu, 5 Oct 2017 18:36:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 84E091609D2 for ; Thu, 5 Oct 2017 20:36:04 +0200 (CEST) Received: (qmail 72149 invoked by uid 500); 5 Oct 2017 18:36:03 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 72138 invoked by uid 99); 5 Oct 2017 18:36:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Oct 2017 18:36:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E3FF11A3299 for ; Thu, 5 Oct 2017 18:36:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Bh9s7TY0-T88 for ; Thu, 5 Oct 2017 18:36:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id C6E405FB2E for ; Thu, 5 Oct 2017 18:36:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 11595E09A6 for ; Thu, 5 Oct 2017 18:36:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0CB3C24307 for ; Thu, 5 Oct 2017 18:36:00 +0000 (UTC) Date: Thu, 5 Oct 2017 18:36:00 +0000 (UTC) From: "James Taylor (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PHOENIX-4277) Treat delete markers consistently with puts for point-in-time scans MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 05 Oct 2017 18:36:05 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4277: ---------------------------------- Summary: Treat delete markers consistently with puts for point-in-time scans (was: Treat delete markers consistent with puts for point-in-time scans) > Treat delete markers consistently with puts for point-in-time scans > ------------------------------------------------------------------- > > Key: PHOENIX-4277 > URL: https://issues.apache.org/jira/browse/PHOENIX-4277 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Assignee: Vincent Poon > > The IndexScrutinyTool relies on doing point-in-time scans to determine consistency between the index and data tables. Unfortunately, deletes to the tables cause a problem with this approach, since delete markers take effect even if they're at a later time stamp than the point-in-time at which the scan is being done (unless KEEP_DELETED_CELLS is true). The logic of this is that scans should get the same results before and after a compaction take place. > Taking snapshots does not help with this since they cannot be taken at a point-in-time and the delete markers will act the same way - there's no way to guarantee that the index and data table snapshots have the same "logical" set of data. > Using raw scans would allow us to see the delete markers and do the correct point-in-time filtering ourselves. We'd need to write the filters to do this correctly (see the Tephra TransactionVisibilityFilter for an implementation of this that could be adapted). We'd also need to hook this into Phoenix or potentially dip down to the HBase level to do this. > Thanks for brainstorming on this with me, [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)