phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akshita Malhotra (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL
Date Mon, 04 Jun 2018 17:43:01 GMT


Akshita Malhotra commented on PHOENIX-3817:

[~alexaraujo] From the various tests I have run seems like there are certain assumptions
being made with the Multi-Table RecordReader approach. For example, while setting the start
row for a target region scan based on source scan start row, if the target start row is strictly
greater and the size of the target scan is smaller than the source scan this approach would
fail to determine the correct amount of good/bad rows (a subset scenario). Similarly, it would
yield incorrect results if there are holes in the target scan which is a likely error scenario
in case a map reduce job discard nondeterministically processed rows (not very likely in our
migration scenario but generally with M/R).

I was going through the HBase Verify Replication approach, one way to resolve these issues
would be to do something similar i.e. for every source row processed, find the corresponding
target scan (start row = current source row and end row = source split end row) thereby eliminating
the need for a multi-table record reader. 

fyi, [~gjacoby]

> VerifyReplication using SQL
> ---------------------------
>                 Key: PHOENIX-3817
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Alex Araujo
>            Assignee: Alex Araujo
>            Priority: Minor
>             Fix For: 4.15.0
>         Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch,
> Certain use cases may copy or replicate a subset of a table to a different table or cluster.
For example, application topologies may map data for specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an SQL query,
a target table, and an optional target cluster. The tool would compare data returned by the
query on the different tables and update various result counters (similar to HBase's VerifyReplication).

This message was sent by Atlassian JIRA

View raw message