phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kadir OZDEMIR (Jira)" <>
Subject [jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
Date Thu, 26 Mar 2020 02:53:00 GMT


Kadir OZDEMIR updated PHOENIX-5791:
    Attachment: PHOENIX-5791.4.x-HBase-1.5.001.patch

> Eliminate false invalid row detection due to concurrent updates 
> ----------------------------------------------------------------
>                 Key: PHOENIX-5791
>                 URL:
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>         Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
> IndexTool verification generates an expected list of index mutations from the data table
rows and uses this list to check if index table rows are consistent with the data table. To
do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is configured to read
all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two sets: put and
delete. The put set is the set of put cells and the delete set is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their timestamps
into put and delete mutations such that all the cells in a mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. Mutations in this
list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table will be updated
with the correct index row key. However, this assumption does not hold in the presence of
concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more pending updates
from different batches on the same data row are concurrent if and only if for all of these
updates the data table row state is read from HBase under the row lock and for none of them
the row lock has been acquired the second time for updating the data table. In other words,
all of them are in the first update phase concurrently. For concurrent updates, the first
two update phases are done but the last update phase is skipped. This means the data table
row will be updated by these updates but the corresponding index table rows will be left with
the unverified status. Then, the read repair process will repair these unverified index rows
during scans.
> Since expected index mutations are derived from the data table row after these concurrent
mutations are applied, the expected list would not match with the actual list of index mutations.  

This message was sent by Atlassian Jira

View raw message