phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoffrey Jacoby (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-5018) Index mutations created by IndexTool will have wrong timestamps
Date Thu, 13 Dec 2018 18:33:00 GMT


Geoffrey Jacoby commented on PHOENIX-5018:

Thinking about this some more after talking with [~kozdemir] offline about some testing he's
doing that verified that an UPSERT SELECT into an index does the right thing and uses the
SELECT's KeyValue's timestamps.

While that probably lets non-ASYNC index builds off the hook, I think we still have a bug
with ASYNC and partial rebuilds through the IndexTool. The MapReduce job runs a SELECT, and
each call of map() returns a row into a ResultSet. Those column values are then put into a
JDBC Statement as parameters _into an UPSERT VALUES_, not an UPSERT SELECT. Since the select
and upsert are disconnected, I don't see how the timestamps could be connected since the UPSERT
never sees the original KeyValues.

Easiest way to verify this would probably be adding tests to IndexToolIT that assert the rebuilt
index can still be seen with the same SCN that the original data had. 

> Index mutations created by IndexTool will have wrong timestamps
> ---------------------------------------------------------------
>                 Key: PHOENIX-5018
>                 URL:
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0, 5.0.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Kadir OZDEMIR
>            Priority: Major
> When doing a full rebuild (or initial async build) on an index using the IndexTool and
PhoenixIndexImportDirectMapper, we generate the index mutations by creating an UPSERT SELECT
query from the base table to the index, then taking the Mutations from it and inserting it
directly into the index via an HBase HTable. 
> The timestamps of the Mutations use the default HBase behavior, which is to take the
current wall clock. However, the timestamp of an index KeyValue should use the timestamp of
the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of weird side
effects, such as if the base table has data with an expired TTL that isn't expired in the
index yet. 

This message was sent by Atlassian JIRA

View raw message