phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kadir OZDEMIR (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-5018) Index mutations created by IndexTool will have wrong timestamps
Date Tue, 08 Jan 2019 02:20:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736641#comment-16736641
] 

Kadir OZDEMIR commented on PHOENIX-5018:
----------------------------------------

While discussing this further with [~vincentpoon], [~gjacoby] and [~tdsilva] in person, a
third solution has emerged. 

The third solution alternative is to change IndexTool to use the same code path that MetaDataRegionObserver
uses for partial index builds. This code path leverages the doPostScannerOpen method of UngroupedRegionObserver
rebuild index. This method scans the data table to get mutations, and replays these mutations
back on the data table with the REPLAY_ONLY_INDEX_WRITES attribute on the mutations. Indexer
(the coprocessor for managing index updates) checks this attribute and updates only the index
tables for these mutations.  By doing so the index tables get the right timestamps. Thus,
IndexTool can be changed to leverage UngrouppedRegionObserver the same way MetaDataRegionObserver
does. This alternative achieves the code unification without loosing the benefits of MapReduce
framework.

> Index mutations created by IndexTool will have wrong timestamps
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5018
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5018
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0, 5.0.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>
> When doing a full rebuild (or initial async build) on an index using the IndexTool and
PhoenixIndexImportDirectMapper, we generate the index mutations by creating an UPSERT SELECT
query from the base table to the index, then taking the Mutations from it and inserting it
directly into the index via an HBase HTable. 
> The timestamps of the Mutations use the default HBase behavior, which is to take the
current wall clock. However, the timestamp of an index KeyValue should use the timestamp of
the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of weird side
effects, such as if the base table has data with an expired TTL that isn't expired in the
index yet. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message