phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kadir OZDEMIR (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-5018) Index mutations created by IndexTool will have wrong timestamps
Date Fri, 04 Jan 2019 19:31:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734512#comment-16734512
] 

Kadir OZDEMIR commented on PHOENIX-5018:
----------------------------------------

*Background*

There are two ways to fully build an index: synchronously as part of the index create command,
and asynchronously using IndexTool. The synchronous full build is done by using an UPSERT
SELECT statement where data is read from the data table and inserted into the index table.
The UPSERT SELECT code path does not include any logic special to index rebuild, and thus,
the values for HBase timestamps for the insert operation are taken from the current wall clock.
This is the root cause for getting incorrect timestamps when index mutations generated by
UPSERT SELECT.

IndexTool is implemented using MapReduce. There are two types of mapper for IndexTool: PhoenixIndexImportMapper
and PhoenixIndexImportDirectMapper. The former is used for the bulk loading option and the
latter for the direct option of IndexTool. For both of these options, PhoenixIndexDBWriatable
is used to read and write table rows. Reading rows is done through the ResultSet interface
and writing is done through the PreparedStatement interface. Here, scanning of the data table,
i.e., the SELECT statement, and inserting new rows into the index table, (the UPSERT statement)
are executed separately. Since the PreparedStatement interface and thus the PhoenixPreparedStatement
class do not have a method to specify the timestamps for individual columns, the index table
rows gets the timestamps from the current wall clock. This is the root cause for getting incorrect
timestamps when index mutations generated by using an UPSERT statement.

In addition to building index fully, there is also a way to rebuild index partially to recover
from index write failures which can happen during data table updates. When such a failure
happens on an index table, the index is disabled by setting the INDEX_STATE column of the
row corresponding to the index table in SYSTEM.CATALOG.  Also the timestamp associated with
the failed write is recorded in the INDEX_DISABLE_TIMESTAMP column of the same row. This is
done in the UpdateIndexState method of IndexUtil. 

MetaDataRegionObserver runs a periodic task to go through SYSTEM.CATALOG to identify index
tables that are ready for index rebuild. Using the INDEX_DISABLE_TIMESTAMP value, the rows
of the data table is identified to be replayed to rebuild the index. During these replay writes,
the timestamps values in the data table is correctly passed to the index table. In other words,
the partial index rebuild does not have the timestamp problem existing in full index rebuilds.


*Alternative Solutions*

There two main approaches to solve the timestamp problem. 

The first one is to enhance the PhoenixIndexDBWriatable class and the UPSERT and UPSERT SELECT
code paths with the ability to retrieve and set HBase cell timestamps. This requires adding
a new hint (e.g., TIMESTAMP) to UPSERT and UPSERT SELECT statements. This hint for UPSERT
implies that the provided timestamp values should be used instead of using the current wall
clock. For UPSERT SELECT, the hint means that cell timestamps should be retrieved by the SELECT
statement and passed to the UPSERT statement. The following changes have been identified after
a high level inspection of the code. 

PhoenixIndexDBWritable:
 * Change the type of the rowTs property from long to List<Long>
 * Change the methods accessing rowTs (i.e., write and getRowTs) accordingly
 * Add the timestamp hint the UPSERT statement
 * Pass the timestamps to the PreparedStatement argument. To do that unwrap it as PhoenixPreparedStatement
and call setParamaterTimestamp(int parameterIndex, long timestamp) that would be introduced
as part of this solution alternative

PhoenixPreparedStatment:
 * Add a method (e.g., void setParameterTimestamp(int parameterIndex, long timestamp))
 * Add a property called timestamps (List<long> timestamps)

PhoenixSQL grammar:
 * Add a new hint for preserving timestamps when populating an index from a data table, e.g.,
TIMESTAMP

HintNode:
 * Add the new hint for timestamps

UpsertCompiler:
 * Various part of the UpsertCompliler code needs to be changed when the timestamp hint is
given (1) to retrieve cell timestamps and create mutations with these timestamps for the UPSERT
SELECT statements and (2) to use the parameter timestamps in PhoenixPreparedStatment objects
to create the mutations for UPSERT statements.

This list may not be complete and more changes may be required. As can be seen from the identified
changes, the first alternative requires surgical changes in the very core of Phoenix. It requires
changes in the SQL grammar (in the external API) which may be solely used for internal purposes.

The second alternative solution is to use MetaDataRegionObserver for both full index build
and index partial rebuild. This solution implies abandoning using UPSERT SELECT for synchronous
index full build and MapReduce for asynchronous index full build. Actually, with this alternative,
we do not need to make a distinction between partial and full index builds or asynchronous
or synchronous builds since there will be one code path that is used for all purposes. This
solution requires setting INDEX_DISABLE_TIMESTAMP to a predefined value (or data table creation
time) after an index table is created so that all the rows of the data table will considered
by MetaDataRegionObserver (i.e., the partial rebuild will be turn into full index build).
This also implies that all index full builds will be asynchronous, i.e., the full build will
not happen in the context of index table create. 

The second solution requires less code changes and simplifies the current index implementation
by eliminating the code and bugs for IndexTool. The main drawback of the second solution is
that current MetaDataRegionObserver is single threaded and rebuilds one index at a time. However,
it can be enhanced to be multithreaded which will benefit not only full index builds but also
index rebuilds. Thus, I am inclining to implement the second alternative but would like to
hear comments as the implications of the second alternative go beyond fixing this issue. I
also want to make sure I did not miss something important in my analysis, [~gjacoby],[~vincentpoon].

> Index mutations created by IndexTool will have wrong timestamps
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5018
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5018
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0, 5.0.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>
> When doing a full rebuild (or initial async build) on an index using the IndexTool and
PhoenixIndexImportDirectMapper, we generate the index mutations by creating an UPSERT SELECT
query from the base table to the index, then taking the Mutations from it and inserting it
directly into the index via an HBase HTable. 
> The timestamps of the Mutations use the default HBase behavior, which is to take the
current wall clock. However, the timestamp of an index KeyValue should use the timestamp of
the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of weird side
effects, such as if the base table has data with an expired TTL that isn't expired in the
index yet. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message