From issues-return-3354-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Thu Dec 13 19:43:48 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 430EC180609 for ; Thu, 13 Dec 2018 19:43:48 +0100 (CET) Received: (qmail 17399 invoked by uid 500); 13 Dec 2018 18:43:47 -0000 Mailing-List: contact issues-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list issues@phoenix.apache.org Received: (qmail 17384 invoked by uid 99); 13 Dec 2018 18:43:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2018 18:43:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E2514C8CE8 for ; Thu, 13 Dec 2018 18:43:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yPDwISwiJPU1 for ; Thu, 13 Dec 2018 18:43:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D548861041 for ; Thu, 13 Dec 2018 18:33:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7EE87E00D4 for ; Thu, 13 Dec 2018 18:33:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 394EF25328 for ; Thu, 13 Dec 2018 18:33:00 +0000 (UTC) Date: Thu, 13 Dec 2018 18:33:00 +0000 (UTC) From: "Geoffrey Jacoby (JIRA)" To: issues@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-5018) Index mutations created by IndexTool will have wrong timestamps MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720467#comment-16720467 ] Geoffrey Jacoby commented on PHOENIX-5018: ------------------------------------------ Thinking about this some more after talking with [~kozdemir] offline about some testing he's doing that verified that an UPSERT SELECT into an index does the right thing and uses the SELECT's KeyValue's timestamps. While that probably lets non-ASYNC index builds off the hook, I think we still have a bug with ASYNC and partial rebuilds through the IndexTool. The MapReduce job runs a SELECT, and each call of map() returns a row into a ResultSet. Those column values are then put into a JDBC Statement as parameters _into an UPSERT VALUES_, not an UPSERT SELECT. Since the select and upsert are disconnected, I don't see how the timestamps could be connected since the UPSERT never sees the original KeyValues. Easiest way to verify this would probably be adding tests to IndexToolIT that assert the rebuilt index can still be seen with the same SCN that the original data had. > Index mutations created by IndexTool will have wrong timestamps > --------------------------------------------------------------- > > Key: PHOENIX-5018 > URL: https://issues.apache.org/jira/browse/PHOENIX-5018 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.14.0, 5.0.0 > Reporter: Geoffrey Jacoby > Assignee: Kadir OZDEMIR > Priority: Major > > When doing a full rebuild (or initial async build) on an index using the IndexTool and PhoenixIndexImportDirectMapper, we generate the index mutations by creating an UPSERT SELECT query from the base table to the index, then taking the Mutations from it and inserting it directly into the index via an HBase HTable. > The timestamps of the Mutations use the default HBase behavior, which is to take the current wall clock. However, the timestamp of an index KeyValue should use the timestamp of the initial KeyValue in the base table. > Having base table and index timestamps out of sync can cause all sorts of weird side effects, such as if the base table has data with an expired TTL that isn't expired in the index yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)