hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
Date Wed, 04 Apr 2018 19:39:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ted Yu updated HBASE-20305:
      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Thanks for the patch, Wellington.

Thanks for the review, Dave.

> Add option to SyncTable that skip deletes on target cluster
> -----------------------------------------------------------
>                 Key: HBASE-20305
>                 URL: https://issues.apache.org/jira/browse/HBASE-20305
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 2.0.0-alpha-4
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>             Fix For: 3.0.0
>         Attachments: 0001-HBASE-20305.master.001.patch, HBASE-20305.master.002.patch
> We had a situation where two clusters with active-active replication got out of sync,
but both had data that should be kept. The tables in question never have data deleted, but
ingestion had happened on the two different clusters, some rows had been even updated.
> In this scenario, a cell that is present in one of the table clusters should not be
deleted, but replayed on the other. Also, for cells with same identifier but different values,
the most recent value should be kept. Current version of SyncTable would not be applicable
here, because it would simply copy the whole state from source to target, then losing any
additional rows that might be only in target, as well as cell values that got most recent
update. This could be solved by adding an option to skip deletes for SyncTable. This way,
the additional cells not present on source would still be kept. For cells with same identifier
but different values, it would just perform a Put for the cell version from source, but client
scans would still fetch the most recent timestamp.
> I'm attaching a patch with this additional option shortly. Please share your thoughts.

This message was sent by Atlassian JIRA

View raw message