phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kadir OZDEMIR (Jira)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5795) Supporting selective queries for index rows updated concurrently
Date Tue, 24 Mar 2020 23:55:04 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kadir OZDEMIR updated PHOENIX-5795:
-----------------------------------
    Attachment: PHOENIX-5795.4.x-HBase-1.5.001.patch

> Supporting selective queries for index rows updated concurrently
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-5795
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5795
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Critical
>         Attachments: PHOENIX-5795.4.x-HBase-1.5.001.patch
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From the consistent indexing design (PHOENIX-5156) perspective, two or more pending updates
from different batches on the same data row are concurrent if and only if for all of these
updates the data table row state is read from HBase under the row lock and for none of them
the row lock has been acquired the second time for updating the data table. In other words,
all of them are in the first update phase concurrently. For concurrent updates, the first
two update phases are done but the last update phase is skipped. This means the data table
row will be updated by these updates but the corresponding index table rows will be left with
the unverified status. Then, the read repair process will repair these unverified index rows
during scans.
> In addition to leaving index rows unverified, the concurrent updates may generate index
row with incorrect row keys. For example, consider that an application issues the verify first
two upserts on the same row concurrently and the second update does not include one or more
of the indexed columns. When these updates arrive concurrently to IndexRegionObserver, the
existing row state would be null for both of these updates. This mean the index updates will
be generated solely from the pending updates. The partial upsert with missing indexed columns
will generate an index row by assuming missing indexed columns have null value, and this assumption
may not true as the other concurrent upsert may have non-null values for indexed columns.
After issuing the concurrent update, if the application attempts to read back the row using
a selective query on the index table and this selective query maps to an HBase scan that does
not scan these unverified rows due to incorrect row keys on these rows, the application will
not get the row content back correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message