phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-5055) Split mutations batches probably affects correctness of index data
Date Tue, 04 Dec 2018 05:52:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708231#comment-16708231
] 

Thomas D'Silva commented on PHOENIX-5055:
-----------------------------------------

[~gjacoby] We use a map in {{MultiRowMutationState}} so I think there wouldn't be multiple
rows with the same rowkey. [~Jaanai] another way to fix this is when creating a new batch
of rows because we reached the size or number of rows limit to check if the next row has the
same rowkey as the last row that we just saw, in which case don't include it in the batch
that is being added. 
The deletes are column deletes to represent null values as [~vincentpoon] said, its not clear
how this would cause the index to get out of sync. Did you use the ConcurrentTest to create
the index out of sync issue?

> Split mutations batches probably affects correctness of index data
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-5055
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5055
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.1
>            Reporter: Jaanai
>            Assignee: Jaanai
>            Priority: Critical
>             Fix For: 5.1.0
>
>         Attachments: ConcurrentTest.java, PHOENIX-5055-v4.x-HBase-1.4.patch
>
>
> In order to get more performance, we split the list of mutations into multiple batches
in MutationSate.  For one upsert SQL with some null values that will produce two type KeyValues(Put
and DeleteColumn),  These KeyValues should have the same timestamp so that keep on an atomic
operation for corresponding the row key.
>  Found incorrect indexed data for the index tables by sqlline.
> !https://gw.alicdn.com/tfscom/TB1nSDqpxTpK1RjSZFGXXcHqFXa.png|width=665,height=400!
>  
> Running the following:
> {code:java}
> conn.createStatement().executeUpdate( "CREATE TABLE " + tableName + " (" + "A VARCHAR
NOT NULL PRIMARY KEY," + "B VARCHAR," + "C VARCHAR," + "D VARCHAR) COLUMN_ENCODED_BYTES =
0"); 
> conn.createStatement().executeUpdate("CREATE INDEX " + indexName + " on " + tableName
+ " (C) INCLUDE(D)"); 
> conn.createStatement().executeUpdate("UPSERT INTO " + tableName + "(A,B,C,D) VALUES ('A2','B2','C2','D2')");

> conn.createStatement().executeUpdate("UPSERT INTO " + tableName + "(A,B,C,D) VALUES ('A3','B3',
'C3', null)");
> {code}
> dump IndexMemStore:
> {code:java}
> hbase.index.covered.data.IndexMemStore(117): Inserting:\x01A3/0:D/1542190446218/DeleteColumn/vlen=0/seqid=0/value=
phoenix.hbase.index.covered.data.IndexMemStore(133): Current kv state: phoenix.hbase.index.covered.data.IndexMemStore(135):
KV: \x01A3/0:B/1542190446167/Put/vlen=2/seqid=5/value=B3 phoenix.hbase.index.covered.data.IndexMemStore(135):
KV: \x01A3/0:C/1542190446167/Put/vlen=2/seqid=5/value=C3 phoenix.hbase.index.covered.data.IndexMemStore(135):
KV: \x01A3/0:D/1542190446218/DeleteColumn/vlen=0/seqid=0/value= phoenix.hbase.index.covered.data.IndexMemStore(135):
KV: \x01A3/0:_0/1542190446167/Put/vlen=1/seqid=5/value=x phoenix.hbase.index.covered.data.IndexMemStore(137):
========== END MemStore Dump ==================
> {code}
>  
> The DeleteColumn's timestamp larger than other mutations.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message